^ DOCUMENT RESUME 



ED 225 212 
TITLE 



CS 504 085 



INSTITUTION 
SPONS AGENCY 



REPORT NO 
PUB DATE 
CONTRACT 
QRANT 



NOTE 
PUB TYPE 



EDRS PRICE 
DESCRIPTORS 



Speech Research: A Report on^ the Status and Progress 
o£ Studies on the Nature of Speech, Instrumentation 
for Its Investigation, and Practical Applications,^ 
Status Report, July l-December 31, 1982. 
Haskins Labs., New Haven, Conn. 

National Institutes of Health (DHHS), Bethesda, Md.; 
National Inst, of Child Health and Human Development 
(NIH), Bethesda, Md.; National Inst, of Neurological 
and Communicative Disorders and Strojce (NIH), 
Bethesda, Md.; National Science Foundation, 
Washington, D.C.; Office of Naval Research,^ 
Washington, D.C. >^ t 

SR-71/72 (1982) 

82 . ' 

NICHHD-NOl-HD-1-2420; ONR-N00014-83-C-0083 
NICHHD-HD-01994; NiCHHD-HD-16591 ; NIH-RR-05596 ; 
NINCDS-NS13617; NtNCDS-NS13870; NINCDS-NS18010 ; 
NSF-BNS-8111470; NSF-PRF-8006144 
356p. 

Reports - Research/Technical (143) — Collected Works 
- Conference Proceedings (021) 

MF01/PC15 Plus Postage. 

Adults; *Articulation (Speech); *Auditory Perception; 
Children; bognative Processes; *Communicat ion 
Research; Language Research; Memory; *Oral Language; 
Reading Processes; Reading Research; Research 
Methodoljogy; *Speech Instruction; Speech Pathology; 
♦Speech Skills; Stress (Phonology); Visual 
Perception; Vowels ^ i 

ABSTRACT 

Research reports on the nature of speech, 
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COHVBRGING SOURCES OP EVIDEHCE.ON SPOKBH AND PERCEIVED RHYTHHS OF SPEECH: ^ 
CTCLIC PRODUCTIOH OF VOWELS IH SEQUENCES OP MONOSYLUBLIC STRESS FEET* 



Carol Fbwler* 



Abstract. Uie manuscript reviews the literature from psychology, 
phonetics, and phonology^ bearing on production and perception of 
syllable timing in speech, A review of the psychological and 
phonetics literature suggests that the production of vowels and 
consonants is interleaved in syllable sequences in such a way that 
vowel production is continuous, or nearly so. Based on that 
literature, an hypothesis is developed concerning the perception of 
syllable tijning assuming that voWel production is continuous. 

ii . 

■me hypothesis is that perceived syllable tijning corresponds to 
the timed sequencing of the vowels as produced and not to the timing 
either of vowel onsets as conventionally measured J or of syllable- 
initial consonants. Three experiments support the^^ypo thesis. One 
shows that infomation present during the portion/ of an acoustic 
signal in which a syl 1 able- initial consonant predominates is used by 
listeners to identify the Vowel. Compatibly, this iyfomation for 
the vowel contributes to the vowel's perceived duration. Finally, a 
measure of the perceived timing of a syllable correlates signifi- 
cantly with the time required to identify syllable-medial vowels but 
not vlth time to identify the syllable- initial consonants. 

Further support for the proposed mode of vow^l- consonant pro- 
duction and perception is derived from the literature on phonology, 
langiage- specific phonological conventions can be identified that 
may reflect exaggerations and conventional izationS of the articula- 
tory tendency for vowels to be produced continuously in speech. 

Tb their speaker/ hearers, both naive (Donbvan & Darwin, ^1 973; ^Liehiste, 
1972) and expert ( Abercroobie, 1964; Glasse, 1939; Pike, 1945). lariguagea 
sound rhyttaical. Ttie tern "rhythm" as applied to epeech refers generallyto 
an ordered recurrence of strong and weak elements. In this general sense, 
languages clearly are rhythmical: consonants and vowels approximately alter- 
nate and, in stress languages such as BngUsh, so do stressed and unstressed 
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syllables. Hoveverp4 attempts to validate the intui,tuion that speech is 
rhythmical have focused on recurrence defined temporally — in particular, on 
the question iihether the regular recurrence of cerjbain spoken units *i3 
isochronous. 

Three classes of rhythm have been proposed for languages: They are 
stress- timing (English , 'Swedish) , syllable timing (Spanish, Italian, French), 
and mora timing (Japanese). In rhythmical utterances, a unit of speech — the 
stress- foot, the syllable, or the mora^ — is said to be regulated temporally, 
so that onset-oi^ae^t intervals between units are approximately isochronous. In 
a stress- timra^jjataguage, for example, intervals between onsets of stressed 
syllables are sMd^^to approach isochrony even though some intervals may be 
monosyllabic and others di- or trisyllabic (e.g., Abercrombie, 1964; Catford, 
1977; Glasse, 1939; Pike, 1945). 

The bases for linguists* and other listeners* impressions of isochronous 
rhythms in speech are unknown. However, it is known that, with the possible 
t^xception of mora timing in Japanese (e.g., Dalby A Port, 1981; Han, 1962), 
the basis is not acoustic isochrony, or, in stress- timed languages, eVen near 
isochrony, of the intervals that have been proposed as relevant* Englisl^ is 
probably the most studied language in this Regard, and many researchers have 
reported large departures from measured acoustic isochrony of stress feet. in 
spontaneous (Lea, Note 1; Shen A Peterson, 1962) and more constrained '( Classe , 
1939; Lehiste, 1972) utterances. 

It is unlikely, then, that any units of" naturally produced eipeech are 
realized isochronously . In view of that, the interesting ques'tions to ask now 
are where the Impression of rhythmicity comes from, whether recurrence of any 
of the units of speech that do recur is perceptually significant, whether it 
is linguistically significant, and whether it is articulatorily significant. 
Evidence bearing on these questions derives from research reported in the 
psychological literature and the linguistics literature on phonetics and 
phonology. This manuscript and one following are intended to bring together 
these research lines and thereby to assess the state of our understanding of 
spoken and perceived rhythms of speech. ' * . 

The two papers in the series differ in scope. The current one considers 
only fljonosyllable utterances in which all syllables are stressed (e.^g., from 
Bolinger: "You make John tell who stole that calf**). The reason f(5r t'ni^^ 
narrow focus is that fairl^ extens^iv.e "but disparate lines of research--in 
psychology relating to perception in pht>netics concemi'ng articulation, and 
in phonology concerning structure in sound sequences — converge to suggest a 
coherent perspective on rhythmic speech production and on perception of 
rhythmic speech in an idealized stress- timed language where feet are monosyl- 
labic. Less extensive lines of research provide a less coherent picture of 
production and perception of speech where unaccented syllables are produced. 
This latter literature is the subject of the second mant^script. 

In the present paper, discussion is limited also in a second way. 
Initially, I consider ways in which talkers comply with instructions to 
produce stress ( syllable)- timed speech and the ways in which listeners assess 
those productions. Before it is possible to draw, realistic conclusions 
concerning rhythms that may or may not underlie production of spoken 
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languages, and before we^ can ascertain whether the impression 'of rhythm is 
realistic or illusory, it is imperative that we learn how to recognize rhythm 
in speech idien it occurs. 

I will first review the literature concerning production and perception 
of sequences of monosyllabic stress feet. The literature under review 
suggests two conclusions, one -concerning the production of vowels in fluent 
speech and one concerning their perception. -These proposals are tested in a 
series of three experiments. 

In the second part of the paper, I introduce evidence from the linguigj 
tics literature on phonology that may converge with the experimental evidence 
reviewed or presented in Part I. In Part II, I attempt to introduce and 
defend three basic ideas. One is the general idea that direct investigation 
of linguistic structure can provide a useful source of converging evidence 
with that provided by experimental investigations of language use. The second 
is the more specific idea that som.e phonological rules can be identified as 
exaggerations and conventionalizations of articulatory dispositions, and as 
such, can provide converging evidence for the identity of dispositior 
Third, I attempt to identify several instances of phonological rules that ai 
"natural" (that is, reflect articulatory dispositions) if the manner of vov 
production proposed in Part I is in fact an articulatory disposition. 

iv the final part of the paper, conclusions are drawn from the array of 
findings reviewed and presented in Parts I and II. 

PART I. MONOSYLLABIC STRESS FEET 

The Perceptual Evidence and Sbme . Articulatory Correlates 

Several years ago, Morton , ^Marcus , and Franki shd 976; see also Marcus, 
1981) reported a systematic discrepancy between the measured timing of a 
sequence of digits and its perceived timing. In particular, they found that 
sequences of digits with acoustically isochronous onset-onset intervals sound 
unevenly timed to listeners. Given an opportunity to adjust the intervals, 
between digits . until the timing sounds isochronous, listeners introduce 
systematic departures from measured acoustic isochrony. This finding is 
almost complementary to one reported by Lehiste (197^) and others (Donovan a 
Darwin, 1979) on listeners* perceptions of sentential rhythms. This litera- 
ture (reviewed in Fowler, Note l) reports that listeners may fail to detect 
deparjtures from measured isochrony in spoken sentences. Although this latter 
collection of studies is interpreted as revealing listener insensitivity to 
foot durations, the findingjB by Morton et al. cannot have th^ipterpretation^ 
Indeed, taken tog^ether, the two sets of findings augg^l^at listeners* 
impressions of speech timing are hot based on the same iftervals measured by 
vinvestigators. This was the interpretation offered by Morton et al . of their 
own findings. 

An investigation ''of talkers* productions of isochronous sequences sug- 
gests one important difference between measured and perceived rhythmic inter- 
vals. In particular, the latter but not the fomer sometimes can be 
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identified irlth rhytfapiic articulatory interval's (Ebwler, 1979; Fowler 4 
Tasainary, 1981). Asked to produce isochronous sequences of monosyllables, 
talkers produce sequences with just the measured dejpartures from isochrony 
that listeners require in order to hear the sequences as evenly timed ( Fowler , 
197^). 

This research indicates that talkers' ^and listeners* notions of rhythmi- 
city in speech agree but differ from those of experinrente|:s. Such a pattern 
of agreement and jHeagreement invites two interpretations. One is that 
talkers and listeners are subject to an illusion that experimenters, working 
on visible rather than 'audible displays of speech, evade. Another is that 
talkers produce rhythmic speeoji on. request in these studies and listeners 
recognise it as such. . For their^ part, experimenters , fail to detect the 
rhythmicity because their extperimental measurements somehow fail to reflect 
the natural structure of the spoken sequences. Ihe latter is the mor^ 
conservative of the two views because it ascribes no special processes or 
behavioi's to listeners and talkers. The talker is assumed simply to follow 
instructions and the listener to detect the natural structure of the acoustic 
signal provided by the talker. In additiprf; this interpretation appears a 
realistic one in view of the w©ll-1cnown difficulties involved in the measure- 
ment of speech because it is coarticulated . 

From the perspective of this second interpretation, assessments of the 
rhythmic structure of naturally- produced speech sequences will be inaccurate 
until experimenters discover what o-ounts as rhythmicity for talkers and 
listeners. Biis best can be determined to begin with, perhaps, by studies in 
which talkers are asked to produce sequences with specified timing and their 
performances are examined. ^ ^ * 

In the study by Fowler (1979), talkers produced sequences consisting of a 
pair of rhyming consonant- vowel-consonant ( CVC) syllables in alternation (for 
exanple, /bad sad bod.../). ^In these sequences, talkers produced long inter- 
vals between measured acoustic onsets of syllables when the first syllable in 
the interval began with a lon^-duration prevocalic segn|gnt. Indeed the 
departures from measured, isochrony of successive intervals could be 'predicted 
very closely from differences in the measured durations of the syllable- 
initial consonants. Figure 1 displays the relationship found in Fowler 
(1979)- Ihe onset-onset time differences in these productions ranged fyora a 
minimum of about^ 35 msec for se-quencea such as /mad nad . . ./ in which initial 
consonants were similar in manner cjass to a maximim of about 200 msec when 
consonants differed in manner and in other features (e.g., /bad sad.../). 

Although measured vowel onsets tend to be aligned more evenly than onsets 
of acoustic energy for the initial consonants of the syllables, intervals 
between vowel onsets are not isochronous either; instead they show departures 
'from isochrony complementary to those of syllable onsets. 

Articulation may be isochronous in these productions, however. When 
monosyllajbles in a sequence are rhyming cVbs , measures of interval? between 
onsets of muscle activity involved in segment production have revealed 
icwchrony both of initial consonant and of» vowel-related muscle ac tivity. 
This is found even in sequences showing substantial departures from measured 
acoustic isochrony (Tuller a Fowler, I98O). For example, in a sequence 
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Pisure 1. mffarences In duration of prevocalic acouatic energy in ayllablee 
produdisd in aX tarnation (Fbwler. 1979) Plotted as a function of 
ayllablo onaet-onset asynchrony. Itetu are from a single talker 
instructed' to produce the ayllablea evenly atreaaed and timed, 
paired letters on the figure refer to syllable- initial segmenta. 
Ptor example, (s-a) refers to uttorancjj /aad ad.../. 
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/bak fak'bak.,./, EMG activity of the orbicularis oris muscle involved in lip 
closure was found to be isochronous; this implies that lip closures for /b/ 
and, /f/ also were isochronous in these utterances. Necessarily, however, 
acoustic intervals from stop^ release for /b/ to onset of frication for / f / . 
were shorter than the opposite intervals from frication to release. This 
departure* from isochrony of acoustic-energy onsets follows from the timing 
relation between the coni^nant articulations and their acoustic correlates. 
Cons6^<^ts are produced in three broad phases: a closing phase, a closure 
iniet^j^^ and a release ^ phase- During the closure interval for the sJtop 
cons(^^|lnt /b/i the lips are shut and in stressed, syllable-initial position, 
theh||jn is silent. ' stop burst occurs on release of the closure in 

the :final|; phase of consonant production. In contrast-^o the stop consonant 
/b/ , the |Bricativ%/f/ has a npisy closure interval. During cWfture, the 
lower lip%pproxim^es the upper teeth, but does not seal off the oral cavity 
to the passage; ! of air. Air .passing through the narrow constric t^ion j)roduces 
frication. Consequently, a talker who aligns^ closure phases of syllable- 
initial stops istnd fricatives will produce syllables * with systematically 
anisochrqihous onsets of acoustic energy. ^ . 

•Diese studies suggest, then, that talkers comply with instructibns to 
produce isochronous monosyllables by producing isochronous articulations . 
Biey do not try to compensate for the different times after articulatory onset 
that .different manner classes of consonant have acoustic consequences. For 
their part, in these experiment^, listeners only hear isochrony when articula- 
tion is isochronous. They hear uneven ^afiin^'"*i^eji acoustic energy onsets of 
different manner classes of consonants ^^:e aligns. We conclude, therefore, 
that in these experiments listeners' perceptions of the^ rhythmic structure of 
speech is based on theit extraction of acoustic information specifying 
articulatory timing (cf. Libeman, Cooper, 'Shankweiler , & Stiiddert-Kennedy, 
1967). This conclusion is compatible with that drawn based on other evidence 
(e.g.. Fitch, Halwes, Erickson, & Libeman, 1980; Lehiste, 1970). For example 
(Lehiste, 'I970X, listeners' judgments of the relative loudness of two vowels 
corresponds more , closely to the ai^ticulatory effort required to^ produce them 
than to their relative intensities. * 

Hie conclusion that perceived timing^ is produced timing does not tell the 
whole story, however. Ble^i^^^^ Fowler found isochrony 

bo'th'of consonant- and of vo^l-relaf^d muscle activity. A later experiment 
( R>wler & Tfessinary, 1981 ) showed t^at initial consonants are not always 
articulated at isochronous intefvals iri:' sequences that talkers intend to be 
isochronous. Figure 2 displays, lieasureto^^ts of a set of syllabijj^s' produced in 
time to a metronome by three talkei^(see Rapp, .1971, for similar data on 
Swedish talkers, and 'Allen, 1972anM for analogous data on Ehglish 

obtained using a different procedure)-^V The location of the metronane pulse in 
the CVCs is indicated by the vertic'&L line at zero in the figure. Points 
generally jugt to the left of the metrdnome pulse indicate the onset of 
acoustic energy of the syllable. Points generally just to the right of the 
pulse indicate the measured vowel onset,- and points farther to the right 
indicate measured vowe^ offset. ?y showin^v^:the alignment of rhyming syllables 
with the metronome pulse, the figure also t^sveals how syllables are aligned in 
relation to one another. The figure shows the effect reported by Morton et 
al. (1976) and studied further by Fowler (l979) and by Tullef and Fowler 
(1980). Acoustic energy onsets for fricatives are early relative to those for 
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Figure 2. Measures* of syllables produced talkers in Powler and Ta'ssinary 
"(1981) in time with a metronome. The vertical line at zero 
represents the metronome pul^e. Different syllables are plotted 
top to bottom in the figure. The points generally to the left of 
the line represent the onset of acoustic energy for each syllable 
relative to the metronome pulse. Points generally just to the 
right of the pulse represent tJfe measured vowel onsdt (that is, the 
onset of voiced oral foimants for the vowel) . Points to the far 
right represent measured vowel offset (the beginning of closure for 
final /d/) . 
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Figure 3. Msasured voWel shortening ^ in the context of preceding (a) and 
following (b) consonants in Ehglish. 
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voiced stops. Of interest here is another finding, however. Acoustic energy 
onsets of intervals beginning with consonant clusters are early relative to 
others. A talker producing the sequence /sad strad sad.../ in time with the 
metronome does not produce isochronous acoustic onset-onset times--a8 he or 
she- nould if /s/ production were initiated at temporally eq^uidistant inter- 
vals. Consequently, riiatever the talker may have been producing rhythmically 
in these utterances, itVjras^not initial- consonant production. 

The alignments are not related to the amplitude contours of the syllables 
(Morton et al . , 1976; Tuller a Powler, t^81 ) , or^, apparently, to their 
fundamental frequency contours (Rapp, 1971)- 

In this study, the only acoustic measure temporally equidistant from the 
metronome pulse , and consequently isochronous in these productions , was the 
measured vowel offset. This finding perhaps can be rationalized by examining 
two separate research lines that investigate the temporal and articulatory 
mic restructure of syllables: studies of phonetic shortening and of coarticu- 
lation. 

The Temporal and Articulatory Microstructure of Syllables 

Figure 2 reveals a pattern of vowel shortening in the context of various 
syllable- initial consonants. This pattern of shortening has been reported by 
other investigators for other languages (e.g., Undblora, lyberg, a Holmgren, 
1981). fii Figure 2, the measured duration of the vowel shortens as that of 
the prevocalic consonant or consonants increases in duration. Figure 3ar 
replots the shortening effects in Figure 2 beside others (3b) reflecting 
effects of syllable final consonants on vowel duration. 2 These data resemble 
those reported by UndbTom, lybeig, and Holmgren (l 981 ) on speakers of Swedish 
and show that a vowel's measured duration also shortens as syllable- final 
consonants are added to the syllable. 

Two interpretations of the shortening effects suggest themselves. 
According to one, talkers atlempt to maintain a constant syllable duration in 
production (e.g., Shaffer, 1982). This might be a manifestation of a 
syllable- or stress- timing tendency. If, for whatever reason, talker^_are 
trying to maintain a constant syllable duration, however, they are unsuccess- 
ful as Figure 2 reveals. An examination of the articulatory evidence suggests 
a different interpretation. 

In syllables, the production of conso^agits and vowels is context- 
sensitive, usually in an assimilative way. The context- sensitivity , called 
"coarticulation," occurs very generally in syllables (e.g., MacNeilage a 
DeClerk, 1969). For example, closure for a /b/ followed or preceded by the 
close vowel /i/ is achieved with a more closed Jaw than that for /b/ followed 
or preceded by the open vowel /a/ (Sussman, MacNeilage, a Hanson, 1973)- 
Similarly, the place of ^ articulation of /k/ is fronted in the context of a 
front vowel as compared to a back vowel (e.g., Perkell, 1969)- 

Coarticulation has various explanations in the literature. One explana- 
tion, first proposed by tthman ( 1 966) , appears to account for the vo>iel- 
sho-rtening effects just described as well as for the context-sensitivity of 
segment production. tihman proposes that syllable-initial and -final conso- 
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nants are superimposed on a vowel's leadl&g and trailli)g edges. Moreover, In 
a VCV dlsyllable, vowel- to- vowel gestures of the tongue^ body are produced 
someidiat separately from articulatory gestures for the consonant, tubman's 
evidence for his rather counterintuitive view of disyllable production was 
meager, but it has been substantiated by several subsequent studies. His 
evidence derived from acoustic measures of implosive and explosive formant 
transitions in VCV disyllables produced by a Swedish talker. In t)hman* s 
data, implosive transitions, representing the closing phases of voiced stop 
production, were affected by both vowels in the disyllable. So were the 
explosive transitions following consonant release. This seemed to indicate 
diphthongal production of the two vowels in the disyllables during production 
of the consonants ^ 

Compatible articulatory data have been provided by several investigators. 
Carney and Moll (1971 ) provide cinefluorographic tracings of tongue movements 
during production of C^viC2V2 disyllables in which the second consonant is a 
fricative. They find movanent of the tongue body from V^ to V2 duriiig 
production of C2. Similarly Kent and Moll (1972) find indistinguishable 
trajectories- and velocities of the tongue moving from /i/ to / a/ in "he 
monitored" and "he honored" even though in one but not the other utterance the 
two vowels are separated by a bilabial consonant. Compatible findings are 
reported by several other invdstigators (Barry & Kuenzel ,' 1975; Butcher & 
Weiher, 1976; Perkell, 1969). This set of findings establislTes the vowel as 
the articulatory foundation of a syllable in the sense that it is produced 
throughout the syllable's articulatory extent, and suggests that in VCVs, 
(stressed) vocalic gestures are realized in relation to production of other 
(stressed) vowels even if a consonant intervenes. In additidn, this view of 
vowel and consonant production may explain the measured shortening effects 
that consonants exert on vowels. 

Figure 4 illustrates the relationship between coarticulation and shorten- 
ing implied by these studies^ The figure's horizontal dimension represents 
time and its vertical dimension an abstracjf; .attribute, prominence. Prominence 
refers at once, to the extent to which vcrcal- tracts ac tivity is given over to 
the production of a particular segment, and the extent to which the character 
of the acoustic signal reflects articulatory gestures associated with the 
segment. During the closure phase of a consonant, for example, the character 
of the acoustic signal is largely detemined by the consonant' s manner and 
place of closure; the signal is noisy if the segment is a fricative, silent if 
it is a stop, and so on. Even though a coproduced yowel can influence the 
signal during consonant closure, giving rise to the contejict-sensitivity of the 
signal for the consonant, the voiced formant structure mpst characteristic of 
vowels is absent, during consonant closure. Uiis is indickted in the figure by 
giving the vowel a lesser degree of prominence than the consonant during 
consonantal closure. 

ffeasuring conventions locate segment boundaries approximately where ordi- 
nal changes take place in the prominence of two segments. Thus, boundaries 
delimit acoustic intervals during idiiph an individual phonative segment is the 
most prominent one in the signal. (Moreover, ambiguities arise dDncerning 
lAiere a boundary should be located--for example, between a voiceless stop and 
a vowel [e.g., lAsker, 1972]--when it is not obvious over a certain extent of 
the/signal which of two segments is predominant.) In the VCV depicted in 



10 



OonverglQg Sources of Evidence on Spoken and Perce l^ced Rhythms of Speech: 
Qyclic Production of Vonels in Sequences of MonosyjfTahlH Stress Peet 




Figure 4, vonels would be given boundaries a*-*l!2^-'Md "b" and at "d" and '*e," 
nhile the consonant nould extend from "b" to "d." If the consonant were 
deleted and a W were produced, the first duel's measured extent^ would be 
from "a" to "c** and the second vowel's from "c" to "e." Because of these 
conventions, even if the vowels in the VCV and the W had identical 
articul-atory extents, both wou^^d be measured to shorten in the VCV as compared 
to the W. A f irst-aftproxlmation hypothesis, howeve^r, in view of the 
bidirectional coarticulation and shortening effects, would be that vowels do . 
not change their p^oduce^ durations in consonantal contextrs. Rather, the 
consonants overlap them more or less. * Although this most conservative 
hypothesis almost certainly will have to undergq revision, it is the sijnplest 
one to explain *both coarticulation and shortening in syllables. 

Now let us consider syllables produced in sequence. t)hman proposes that 
in VC^s , transconsonantal vowels are produced as continuous diphthongal ^ 
gestures, to a first approximation, unperturbed by a medial consonant ( sf e 
also , Kent A Moll , 1972) . Extrapolation of this view to longer speech 
sequences at least to longier sequences of stressed syllables) .suggests thAt 
vowels are produced cyclically — that is, continuously, one after the other-- 
and constitute a somewhat separate articulatory stream from gestured involved 
in consonant production. 3 

This hypothesis gives rise to the question how consonants might be timed 
relative to the vowel stream. ^me research by Tuller, Kelso., and Harris 
(1982) suggests part of an answer. Across utterafices of the form PV^CV2P» 
produced at Various rates wp.th different stress patterns and two different 
medial consonants, Tuller et al . found an invariant linear^ relationship 
between duration of a vocalic cycle (that is, the interval Jjatween the onset 
of muscle activity for V^ and that for V2) and the time lag between onsets of 
activity for V^ and C. Tliat is, timing of consonant production relative to 
vowel production was invariant over substantial changes in the duration of a 
vocalic cycle. The evidence suggests a strategy of initiating production of a 
consonant at an invariant phase in the production of a vowel' s cycle. 
(Evidence of vowel shortening as consonants are added to a cluster implies, 
however, that the critical phase in production of -a vowel at which consonant 
production is initiatesi would be different for the single consonants studied 
by Tuller et al. than for clusters.) Tuller et al . point out, preservation 
of relative timing of muscle activity or gestures over changes in rate and ^ 
amplitude of movement is commonly observed across a variety of activities (for 
Example, hand wi^iting : Hollerbach, 1980; Viviand 4 T^rzuolo, 1980; Wing, 1978; 
locomotion: Grillner, 1975; respiration: Grillner, 1977). 
u 

S-^ken and Perceived Syllabic Isochrony Reconsidered 

JfV> The temporal structure of the syllable as just outlined may help to 
rationalize the behaviors Df talkers and listeners in the experiments by 
Jtorton et al. (1976), Fowler (1979). and Powler and Tassinary (1981 ) summar- 
ized earlier. Bjy interpretation, the measured shortening of a vowel estijnates 
how much it has been overlaid by surrounding conson'&nts.A Estijnates of the 
effective overlapping of a vowel by a consonant can be obtained by examination 
of Figure 2. In the figure, the metronome pulse is temporally equidistant 
from the measured vowel offset across the syllables. Moreover, in /ad/, with 
no initial consonant, the metronome pulse nearly coincides with the measured 
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Figure 4. Schematic representation of vowe;L and consonant production. • The 
horizontal axis represents ^iine a^ the vertical axis an abstract 
dimension, prominence. (See text for explanation). 
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Figure 5' 



A display used by Johansson (1950) to. study perceptual vector 
analysis-. L^hts A and C move horizontally back and' forth in 
phase; light B moves diagonally. 
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vowel dnset.^ In other syllables, then, vowel shortening is . the same as the 
interval from th^^ metronome ^^trifse to measured vowel onset. This inter/al 
eatixnatea the interval' of e'ffective consonant-vowel overlap in these syll- 
ables. lS(f hypotheais, ba&ed on the EHG evidence provided in Taller and Powler 
(1980), talkers initiate vowels at temporally equidistant Intervals under 
instructions to produce isochronous sequences of syllables. ?or their part, 
listeners appear to hear vowel timing; moreover, their judgments evidently are" 
based on the articulatory tim-ing of voweiJk, not on the timing of their periods 
Of prominence in the' acoustic^sighal as^ reflected by usual ways of identifying 
their onsets. ' 

For listeners to hear produced rather than measured vowel timing, they 
must segment the speech stream .in an unetpected way. They must do so in such 
a way that the svinmed duration of the segmeniTed consonants and vowels ^xcef^la 
the duration of the spoken syllable from which they have been segmented. Bie 
duration of the vowel must be its measured duration plus the extent of its 
effective overlap by the consonant. 

Experiments 1 and 2 are designed to ask i^ether such a segmentation 
occurs in perception. First, however, we ask, in an abstract way, how such a 
segmentation might oc(\ur. 

In the literature on perception, investigators ar^ familiar with an 
analogous segmentation An which separate contributions to complex events ftre 
perceptually distinguLshed . Figure 5 displays an exanple from Johansson's 
res&arch (1950; see^lso 1974). The figure represents a ^visual display in 
which three movinglig'hts are shown to subjects. The top and bottorti lights, A 
.and Cr move horizontally in phase, while a third light, B, moves J.n a diagonal 
trajectory.' Viewers do not report seeing two lights moving horizontally and 
one diagonally. Instead they report horizontal movement of an apparent rod 
extending from A to C, with B moving vertically along the rod. 

Based on this and similar evidence, Johansson concludes that) viewers 
perfonn a "perceptual vector*' analysis in irtiich movements common to a set of 
points serve as a perceptual frame relative to which residual motions are 
perceived. In iWie figure, all points include vectors of horizontal moHpn. 
Horizontal motion extracted from points A and G exhausts the description of 
their movements,, but extracted from B leaves a residual, vertical motion 
V lector. 

Perceptual vector analysis is, a realistic perceptual behavior. 
Ordinarily when components of a v*isual scene move tfig^ther, they belong to the 
same event; consequently, the common movements are appropriately ascribed to 
coherent movement of a common frame. Imagine, for example, watching a child 
on a merry-go-round. If the child is seated on a horse that moves up and down 
relative to the surface on which it is mounted, then the child on the horse in 
fact moves in a complex, cycloid, motion. The complex motion combines the 
rotation of the merry-go-round with the up and down movement o£ the horse 
relative to the floor of the merry-go- round . Observers do not see the complex 
movement, however. Instead, and appropriately, they see rotational movement 
of the merry-go-round as a irtiole, and an ujv-and-down motion of the child and 
the horse relative to the rotational movement. Tliat is, they extract 
rotational movement, which is common to the merry-go-round and its components. 
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This exhausts the movement of the merry-go-round* s ^ fixed structure, but, 
extracted from the motion of the horses, it leaves a vertical taction vector. 

When TO ask whether a listene^can detect a votoI's produced extent 
despite coarticulatory overlap of part of it by a consonant, to are asking 
whether listeners can do . the speech- perception equivalent Qf a perceptual 
vector analysis. We hav^' seen th6t the votoI serves' as . the articulate ry 
foundation of the syllablyfe; for clarity in making the analogy to the visual 
examples, to call the votoI the frame. It is produced during syllable-initial 
and -final consonants as toII as during its own interval of prominence in the 
signaf. Therefore, acoustic reflectiona of the votoI's component tongue body 
and jaw movements* provide the analogue to the vectors of common mov«nient. 
These reflections exhaust the contributions to the acoustic signal during the 
time that the vowel is the most prominent segment in the syllable, but not^ 
during consonant production*. IXiring consonant produc'^ion, two kinds of 

V articulatory gesture contribute to the acoustic signal — the relatively slow 
gestures of the tongue body and jaw associated with the vocalic frame, and the 
relatively fast gestures of the articulators (possibly including the tongue 
body and jaw) associated with the consonant. If aQ)erceptual vector analysis 
is possible, the. gestures common to the vocalic fra^^e may be "factored" from 
those specific to the consonantal portion, leaving on the one hand , perception 
of the whole vocalic frame and on the other hand, as residual, a relatively 

\ context-free version of the consonant. 

This proposed analysis, like its vfsual counterpart, would be a realistic 
one for perceivers, because it recovers the^ natural structure j of speech 
e>^ents. 

Experiments 1 and 2 were designed to test two predictions derived from 
the hVpothesis that listeners perfonp a perceptual vector analysis on syll- 
ablesjand, hence, may* attend to articulatory timing of votoIs in the 
experiWents outlined at the beginning of Part I. One prediction is that the 
effec^ve duration of a vowel for a listener is its measured duration plus it$ 
effective overlap by a syllable-initial consonant. The second predication is 
that information for votoI identity is available to listeners during the 
production of an overlaid segment. Experiment 1 tests the first prediction 
and Experiment 2 the second*. Experiment 3 ia designed to assess the relation 
between vowel perception and the perceived timing of syllables in experiments 



such as that by Morton et al . 



EXPERIMENT 1 



To ask whether listeners are sensitive to the temporal microstruc ture of 
syllables and in particular to the relationship c^f overlap betTOen syllable- 
initial consonants and post- consonantal votoIs, we used a technique developed 
by Raphaels Raphael (1972) has shown that a syl lable- final atop or fricative 
can be synthesieed that is identified as voiced afier a long-duration votoI 
and voiceless after a short-duration votoI. This is compatible with the fact 
that, particularly in English, voiced syllable- final consonants are preceded 
by longer votoIs than voiceless consonants. By generating a set of stimuli 
with a range of vowel durations befort the final consonant, and asking 
subjects to label the final consonant as voiced or voiceless, Raphael was able 
to identify a voicing boundary within the continuum of votoI durations. The 
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boundary is defined as the vowel duration at which subjects label the* syllabi^ 
equally often as /d/ or /t/--that is, the 50$ crossover ^ point . In later 
studies, Raphael, Do man, and Uberaan (1975) and Raphael and Do man ( 1 980 
showed that^the crossover point is shifted toward the /t/ ( short- vowel) end of 
the continuun by a syl lable- initial '"'consonant . That is, the final consonant 
is heard more frequently as /d/ when a consonant precedes the vowel than when 
the vowel is syllable-initial. This may indicate that the vowel is heard as 
longer when preceded by a consopknt than when it is syl lable- initial . For 
syllable- initial /d/, all or^jnost of the transition8--which in these stimuli 
were necessary to specify the initial /d/--were also heard to belong to the, 
vowel. This interpretation is consistent with the^facts of production; the 
direction and extent of F2 transitions appropr4ate for /d/ are conditioned by 
the following vowel because the two segments are coarticulated during the 
release of the consojnant. ' r . * 

In the study by "%iphael et al . (1975), an initial /r/ also shifted the 
/t/-/d/ boundary substantially, whereas * steady-state frication characteristic 
of /s/ shifted it only slightly. This lattei*outcome tes replicated in 
'Raphael and Doraan with natural speech. These experiments made it clear that 
the perceived voicing of a final stop can be affected by vowel length. In the 
fallowing experiment, I attempt to extend these findings to some of the 
syllables depicted in Figure 2. If adding initial consonants to a vowel 
increases the-voirel's effective duration, then, following Raphael et al . , we 
should observe a change in the voicing boundary of jayllables beginning with 
/a/, /b/, /m/, and /s/. Purthemore, we predict a greater effective lengthen- 

of the vowel by consohantd that according to Figure 2 shorten the vowel 
substantially (for example, /s/)' than by those that shorten it very little 
( for example, /b/) . (This prediction may appear contradictory to the findings 
of Raphael et al . , who found limited effects of /s/ on apparent vowel duration 
and substantial effects of /d/ . The difference in prediction and outcome 
derive from a difference, in measurement criteria for the vowel. In experi- 
ments by Raphael et al., voiced foraant transitions following release of /d/ 
were identified as belonging to the consonant and not to the vowel; hence when 
the addition of transitions affected the voicing judgments, the influence was 
identified as one of the consonant on the eff5cj;ive duration of the vowel. In 
our measurements, however, voiced fonnant tt^ansitione are included in the 
measureroent of vowel duration. Therefore, the predicted additional effect of 
a voi^ced stop such es /d/ or /b/ on voicing Judgments is small.) 

Method 

Stimuli and materials . We selected the syllables /ad/, /bad/, /mad/, and 
/sad/ spokenTiy two of the talkers who provided the data for the experiment 
reported by Fbwier and Tassinary (and were two of the three talkers who 
provided that data shown in Figure 2). 6 These syllables had shown a range of 
vowel shortening that spanned 20 msec collapsed over the two talkers. The 
order of measured vowel durations decreased in the series: /ad/, /bad/, 
/mad/ , and / sad/ . 

For each talker, a single token of each of the four syllables was 
selected from the nonmetronome condition of the experiment reported by Fowler 
and T&ssinary (1981). These syllables were digitized, and edited using the 
puls^-code modulation system at Haskins Laboratories. 
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The final ifcrtion of the syllable /ad/ was spliced from the rest (50 msec 
for talker 1 and 85 msec for talker 2). Ihe portion excluded any voicing 
during the closure for the /d/ and any release of the / d/ to facilitate a 
shift in identification from /d/ to /i?. This final section of the syllable 
/ad/ replaced the final portion of the other three syllables to ensure that 
the final consonant of the four eyllables was equivalently /d/- or /t/-like. 
Finally, the vowetls in each syllable were made equals in duration (within a 
pitch pulse) by deleting pitch pulses from the steady-state portions of 
syllables with longer vowel durations. The initial vowel durations of the 
four syllables averaged 225 msec for talker 1 and 256 msec for talker 2. From 
each of^^tlliese syllables, a 10-step continuum was constructed by Successively 
deleting one pitch pulse for talker 1 and two for talker 2 (a female) taken 
inaoftr -aal^ possible from the relatively steady-state portion of the vowel. 
This gave continua with a rings of approximately 75 msec for talker 1 and 90 
msec for talker 2« 

For each talker, four test orders were constructed, one for each 
Continuum (syllable). Efech ^test order began with twenty trials in which the 
two endpoints of the continuum were repeated ten times each in ' alternation. 
These served to familiarize the listeners with the most /d/- and /t/-like 
.sounds they would hear. The introductory series of 20 trials was followed by 
100 trials in which the 10 stimuli were presented 10 times each in random 
order. This pattern, 20 trials in which the endpoint stimuli were repeated in 
alternation, and 100 randomized trials, was repeated twice more for a total of 
60 introductory trials and 500 test trials. The first third of the teat 
served as practice; the data to be reported are from the last set of 200 teat 
trialfl. There were 2 seconds between trials with a longer delay of 4 seconds 
following every tenth trial. 

Design . Subjects were nested within the four levels of the independent 
variable, Syllable (/ad/, /.bad/, /mad/, and /sad/), and the two levels of the 
variable. Talker. With a single exception, eight subjects were assigned to 
each cell in the design. Only seven subjects were run -for the syllable /bad/ 
produced by the first talker. We expected a shift in the /d/-/t/ boundary 
to.ward the short-vowel (/t/) end of the continuum progressively in the 
sequence /ad/, /bad/, /mad/ , and /sad/ . 

Procedure . Subjects listened to the test ^orders over earphones in groups 
of one to four in a sound- treated room. They were instructed to listen to the 
initial twenty sounds of alternating /i/'-and /t/-final syllables on each 
third of the test, writing "d" or " t" as appropriate on their answer sheet as 
they followed along. On the next 100 trials in each third of the test, they 
were instructed to write "d" or " t" depending on which final consonant they 
heard, choosing only between the responses "d" and "t." 

Subjects * Subjects were 63 introductory psychology students at Dartmouth 
College. 

Rei^ults and Discussion ^ . . 

The prediction — that the voicing boundary would shift toward ft/ progres- 
sively in the series /ad/, /bad/, /mad/, and /sfd/--was assessed by comparing 
the four syllables on the measure of number of "d" responses to each stimulus 
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in the continuum. Pigxire 6 displays the resulte of this procedure collapsed 
over taltere 1 and 2. The ogival curves for the four eyllables croes over the 
30% point in Just the predicted order. Interpolating from the figure, the 
boundaries for /ad/, /bad/, /mad/, and /sad/ are 5^56f 5-70, 5-90, and 6.59. 

In an analysis, the average nuaber of "d"' responeee given to the four 
eyllables vas compared for stimuli "near the voicing . bound ariee, that ie, 
etimuli 5, 6, and 7. Collapsed over Talker and Stimulus nunber (5-7) t since 
neither variable interacted with Syllable, the average nunber of "d" responaea 
out of 20 tj> the four eyllables nae 7.4,/ 8.7, 9"- 1 , and 11. 4. Ihis increase 
reflecte the increasing resistance to labeling the final consonant as " t" 
throughout the series. The increase was significant accoiTSing to a trend teet 
in ■ which the mean for each syllable was weighted accoi^iing to ite measured 
vowel shortening in the syllables displayed in Figure 2.' In the analysis, 
both subject and talker were treated as random factore, F(l,5) - 18.86, 2 ■ 
.02. 

In this analysi^i, listeners' judgments of syllables produced by talker 1 
showed just the predicted increase idiile their judgm^nte of talker 2 showed a 
reversal of /bad/ and /mad/. This reversal in fact occurred on juet one of 
the three croesover stimuli. 

Bie outacme of this analysis, though certainly not etriking, is compati- 
ble with the hypothesis that the duration of the vowel ae perceived by 
Histeners increaeee with increaeee in the vowel* e meaeured overlap by the 
consonant (its meaeured ehortening) . Nonetheleee, whereae the , range of 
shortening was about 20 msec in the experiment by Pbwler and Taeeinary, the 
difference in perceived vowel duration as aeeeased by the present experiment 
was only about 10 msec. 

■ / 

EatPERIHEHT 2 

Experiment 1 has an alternative interpretation to the one tha*t we have 
proposed, fbssibly, lietenere are familiar with different durations of vowels 
fol4x>wing /b/, /m/, and /s/; consequently they expect relatively shorter 
vowels following /e/ than /m/ and following /m/ than /b/. If so, the results 
of Experiment 1 document those expectatione, but do not reveal a tendency to 
hear a vowel during that peirt of the acouetic signal in which vowels and 
consonants coarticulate but consonants predominate in the signal^ 

Experiment 2 was deeigned to provide evidence converging with Experiment 
1 that perceiv ere extract vowel infomation during production of segmente that 
coarticulate with it. If they do, then time to identify a ^vowel, timed frcm 
the vowel's measured acoustic onsst, should be shorter the more exteneive its 
effective overlap with preceding segments.. Estimating overlap by vowel 
ehortening, then, time to identify /a/ should be shorter in / ea/ than in /ma/ 
and shorter in /ma/ than in /ba/. Experiment 2 was designed to test that 
prediction. 

Method' 



Stimuli . Stimuli were naturally produced VCV disyllables in which the 
first vowel was unstreeeed schwa, the consonant was /b/ , /m/, / e/ , or /p/, and 
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the second yowl vas /a/ or /i/. A disyllable with /p/ replaced the syllable 
/ad/ in Bcperinent 1. As Figure 2 ahons, vonel shortening after /p/ is 
greater than that folloning /s/. Oherefore, predicted time to identify a 
vonel is expected to decrease in the series abV, emV, esV,^epV.7 

•Three tokens of eac^ disyllable ^we re produced, giving 24 different 
stimuli in dll. The stimuli were randomized into five 48-trial blocks with^ 
the constraint that in each block each token occurred twice. Stimuli were^ 
recorded oa audiotape with 2 seconds between trials aM 10 seconds between 
blocks . ' 

Tfcible 1 provides durational measures of the stimuli. Measures of schwa 
duration were taken from the onset of periodicity in ;the signal to closure for 
the consonant. Pbr the consonants, the on^et oT"'1bhie closure interval to the 
onset of voicing for the vowel was measured. St rdLsed^ vowels were measured 
from the earliest evidence of voicing following rel'&aae of the consonant^ to 
signal offset. As others have found (see also Figure ^), the durations of 
consonants and stressed vowels were negatively correlated (r = -.76). 

T^ble 2 provides measures of F2 during the initial schwa? of each 
. disyllable. (MBasures were obtained using the 115 analyais package at Hasklns 
Laboratories.) Maasures were taken during the four 20 msec time frames 
preceding closure for the consonant. The table shows that F2 for schwa is 
lower idien the forthcoming stressed vowel is /a/' than yAien it is /i/ . Diis is 
compatible with the substantially higher F2 for the high vow^l /i/ than for 
the low vowel /a/ and indicates that anticipatory coarticulation of the 
stressed vowel precedes closure for the consonant (see also Fowler, 1981a, 
1981b). 

^ Figure 7 displays this more clearly by plotting the difference between- F2 

for /?/ preceding /i/ and /a/ separately for each disyllable pair duri^nfe^e 
last four 20 msec intervals preceding consonant closure. This evidence^f 
coarticulation is compatible with Ohaan' s^ findings and other evidence cita 
earlier. * ^ ^ 

liitil the final frame, disyllables including /bA and /m/ appear to be 
Ji^ more differentiated than those containing, /s/ and . If listeners use 
* SA^rage fi-equency of the second foimant of schwa over these time frames as a 

U source of infoimation about the forthcoming vowel, they will not show the rank 
ordering of response times we have predicted. However, the predicted ordering 
is reflected in the rate of chfinge in the plotted difference score over the 
last threelframes where the change iSsmonotonic; /b/ shows the lowest rate of 
* change anA/p/ the highest. If tKis measure reflects infonftation about 

ongoing adj*i»4ments in vocal tract shape for the forthcoming Vowel to which 
listener^ are sensitive, then Figure'' 7 may offer acoustic support* for the 



\ 



predicted ordering of response times2 



Design . Die major independent variable was" coasonant identity^ a second 
was vowel identity. All subjects participated at all levels 'of the indepen- 
dent v-ariables. The dependent variable was^^time to ^cl^ify the vowel timed 
from the vowel's measured onset. Based on the fiti2||ngs of Fowler and 
T&ssinary diirplayed in Figure 2, I expected reaction time^^y classify a vowel 
as /i/ or /a/, measured trcm the acoustic ot^^et of the vowel's period of 
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Table 1 



Durational Ibasures (msec) of the Disyllables Used in Ecperiments 2 and 3f 
Averaged Over the Three Takens of Bach l^pe. 



.lable 




C 


V 


sba 


61 


•128 


434 


9bi 


61 


123 


465 


sma 


43 


144 


402 




56 ' 


123 


390 


asa 


45 


189 


387 


98 i 


51 


195 


337 


apa 


42 


206 


370 


api 


40 


218 


371 



Table 2 

MeasJLres of F2 of Schna During the Jbur 20 msec Pramee Preceding* Consonant 
Qosure* 

Frame niiaber before closure 



disyl lable 


4 


3 


2 


1 


aba 


1464 


1 406 


1334 


& 

1304 


abi 


>^1676 


1 641 


1628 


1619 


ana 


1469 


147.0 


1403 


1314 


aal 


1755 


1687 


f640 


1670 


aaa . 


1689 


1698- 


• 1693 


1702 


asi 


1794 - 


1791 


1853 


1921 


apa 


1451 


1415 


1373 


1328 


api 


1517 


1426 


1517 


1 683 



2o 



20 



* -s 




4 3 2 1 

Frames before closure (20 msec per frame) 



Figure 7. Anticipatory coarticulation of stressed /i/ and /a/ in the disyll- 
ables of Erperiments 2 and 3- P2 of ipitial schwa in aCa 
subtracted ftrdm P2 of schwa in aCi is plotted for each of the four 
disyllable pairs and for four 20-msec frames preceding closure of 
the consonant. 
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i 

prottinetice, to decreAse in the series abV^ amV, asV, apV because the measured 
vovel durations decrease in the series. I had confidence that this rank 
entering o,f vo»l (jturations is stable because the same rank, ordering is 
reported by Itouse and Bairbanks (1953) 'or vonels in symmetrical bVb, mVm, 
sVs, and pVp contexts. Having previously examined only stimuli in which the 
stressed vonel «s /a/, there vss no reason to expect a difference in reaction 
time to /!/ or /a/ nor any interaction between the variables, consonant and 
vowel identity. 

Procedure . Subjects were tested Individually. They listened to the test 
sequence over earphones, classifying the stressed vowel on each trial as / i/ 
or /a/ by making a button- press response. Tor half the subjects, / i/ 
correspoxided to the left-hand button and for the other half, /a/ corresponded 
to the left-hand button. Responses and reaction times were collected by 
microcomputer. Times werfe measured from the acoustically-defined vowel onset 
by placing a click on the second channel of the audio tape, 100 msec prior to 
nieasured vowel onset on the first channel. In the experiment, these clicks 
caused a millisecond clock to be read; the clock was read again on receipt of 
the subject's button-press response, and the difference in the .times minus 100 
msec was the subject's reaction tim^. 

Subjects were instructed to make their responses as quickly as possible 
but to minimize errors.* 

Subjects . Subjects were 14 undergraduates at Darlmouth College. 
Results 

r 'Results are reported for the final four blocks of the experiment, the 
first, block serving as practice. Subjects were quite accurate, averaging 95^ 
CO r rec t /ov e ral 1 . 

Average reaction times to the disyllables abV, amV, ^asV, and epV were 
483, 468, 463, and 424, respectively. The effect of consonant identity is 
significant, F(3»39) = 33.7, < -001. Mbre importantly, however, the 
decrease in reaction time in the series occurred as predicted. Based on the 
measiired shortening in Figure 2 (averaged over three talkers, those whose 
productions provided Stimuli for Experiment 1 and one other), the predicted 
differences in reaction time in the series is 14 msec for abV. versus amV*, 8 
msec for amV '^versus asV and 8 msec for asV Versus apV. The first two 
predicted differences fit the obserVed differences fairly well; however, the 
obtained difference between asV and apV is 39 msec rather than the predicted 8 
msec. A planned comparison weighting reaction times according to the predict^ 
ed differences is highly significant, P(l,39) » 81 .10, jj < .0001.* 

The main effect of vowel identity is nonpignificant in the analysis, P 
(if 13) * 1.65, jg - .22, but the interaction between consonant and vowel 
identity is significant, P(3»39) « 9.55f JB j < -001. One reason for the 
interaction is that the oi^inal relation of_^iaV and asV is as predicted when 
the vowel is /i/ (465 msec versus 441) but Is reversed when the vowel is /a/ 
(472 versus 484) • In addition, irtien the vowel is /i/, reaction times to asV 
and apV are the same (441 msec) but differ when the vowel is /a/ (406 versus 
484). Me had no reason to predict a difference in rank ordering of reaction 
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times based on vowl identity because in earlier studies the vowel was 
invariably /a/. Whereas articulate ry support for this interaction or other 
reasons for it will have to be investigated, the reasons for the -interaction 
will not be pursued here. However, a similar interaction will be sought in 
listeners' assessments of the timing of the syllable sequences in the next 
experiment. 

Discussion 

This experiment provides evidence that vowels are detected during inter- 
vals* when the vowels coarticulate with prevocalic segments (including initial 
consonant and the preceding schwa). Bcperiment 2 shows that the tiriie to 
identify a vowel, timed relative to measured vowel onset, is correlated with 
the vowe^l's measured shortening. Based on the coarticulation evidence cited 
earlier (and represented schanatically in Figure 4)» we interpret the relative 
shortening as an index of relative overlap by the prevocalic consonant (and, 
perhaps, by the unstressed schwa;, see also Ibwler, 1981a, 1981b). Therefore 
we interpret the decrease in vowel classification time with shortening as 
evidence that listeners use ijqffomation for the vowel in the prevocalic 
segments as inforaation-fi^r vowel identity. 

These results converge with those of Experiment 1. That experiment found 
that the measured duration of a vowel at which judgments of voicing of a 
syllable- final consonant shift from voiced to voiceless decreases progressive- 
ly in the series /ad/, /bad/, /mad/, and /sad/. One interpretation of this 
outcome is that listeners are sensitive to the shortening effects of conso- 
nants and vowels displayed in Figure 5a, but another interpretation is 
prombted by the results of Experiment 2. It is that the effective duration of 
a vowel for a listener is the vowel's measured duration plus the overlap of 
part of its perceived extent by a syllable- initial consonant. 

Previous experiments in this series (Fowler, 1979; Fowler a T&ssinary, 
1981 ) have used the vowel /a/ exclusively. Experiment 2 introduced the vowel 
/i/ and * obtained an interaction between initial consonant and vowel in vowel^ 
classification times. In Experiment 3» assesements are made of the relative 
rhythnic alignment of the syllables . used in Experiment 2. If perception of 
vocalic timing ^underlies the perception of speech rhythm^ as we propose, then 
the interaction found in Experiment 2 should be reflected also in listeners' 
rhythmic alignments of these disyllables. Experiment 3 tests this prediction. 

EXPERIMENT 3 

In this experiment, we relate listeners' vowel classification times, 
obtained in ' Experiment 2, to listener perceptions of rhythmicity, which we 
propose have their bases in perception of cyclic vowel production. In 
addition we also assess the relation of listeners' consonant classifications 
to their perception of rhythm. According to the view of perception being 
developed here, consonant classifications are not related to the perceived 
timing of syllables. 
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Method 

Stimulus materials * Ohe experiment used the audio tape devised for the 
vowel classification task of Experiment 2. 

procedure. In^ Bjcperiment 2^ subjects were asked to classify the stressed 
vowel . on each trial as /i/ or /a/. In the present experiment, one group of 
subjects was asked to tap a key in tioae with the successive disyllables, 
tapping once for each disyllable at a point corresponding to the syllable's 
"beat." This technique, like the metronome technique used by Rapp (1971) and 
Fbwler anl Tfctssinary (1981 ), enables discovery of the perceived temporal 
alignment of different syllables (see Figure 2). 

A second group of subjects was asked to classify the consonants on each 
trial as /b/ , /m/, /p/, or /s/, making a -^button-press response as quickly as 
possible. Assignment of phonetoe labels to buttons was varied over subjects. 

Design . As in Experiment 2, independent variables are consonant identity 
(/b/, /m/, /p/, /s/) and stressed vowel identity (/i/, /a/). The dependent 
measure is response time, initially measured relative to measured vowel onset 
and next relative to measured stressed syllable onset. We expected vowel 
classification times obtained in Experiment 2 to correlate with tap times in 
the present experiment. Ohis would suggest a close relation between infonna- 
tion necessary to identify a vowel and perceived relative timing of the 
disyllables. No such relation was predicted between consonant classification 
times and tapping times. 

Subjects . Subjects were 30 Dartaouth undergraduates. Fifteen partici- 
pated in the tapping task and 15 in the consonant classification task. 

' Results 

When tapping times are measured relative to vowel onset, the effect of 
consonant is highly significant, P(3,42) » 297.78, J < .0001. Tap times 
follow vowel onset by: 207 msec, 187 msec, 137 msec, and 125 msec for the 
disyllables sbV, smV, ssV, and apV, respectively. This is exactly the rank 
ordering of disyllables obtained in Experiment 2 although responses to asV are 
closer in reaction time to spV in the present experiment and to amV in 
Experiment 2* 

As in Experiment 2, the effect of vOwel identity is nonsignificant, 
F(l,14) - 2.16, p - .16, but the interaction is significant, F(3t42) « 20.63, 
5 .001. In ^periment 2, there were two reasons for the interaction. 
First, the rank ordering of times to smV and asV were as predicted (based on 
measured shortening in Figure 2) *en the vowel was /i/, but reversed when the 
vowel Was /a/. Hext, there was no difference in reaction time to asi and api 
but a large difference between asa and apa. In the present experiment, the 
predicted rank ordering of amV and asV was obtained for both vowels. However, 
as in ExperLnent 2, there was essentially no difference in tapping tiines to 
asi and epi (123 versus 121 msec), but the predicted direction of difference 
appeared between asa and apa. 
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Table 3 



Measures of Besponse .Time (msec) in Escperimente 2 and 3 Timed from ^Oxset of 
Acoustic Ehergy for the Consonant. (in parenthesee, timed • from onset of 
closure for /V and /p/. In brackets, the standard deviation.) 



disyllable 

aba 

ebi^ 

ana 

emi 

esa 

asi 

apa 

epi 



tap 

(328)205 

(338) 218 
328 

»313 
539 
320 
(3350233 

(339) 246 



45 
46 
55 
54 
59 
L54J 
58 
49 



consonant 



(728)605[83, 
(757)637 82 
670 83 
668 831 
> 683 .59 
673 ,73 J 
(762)660.127] 
(703)61 0L99] 



vowel 

(618)4 95[74 
(600)480.79 
616 72 
588 177 
673 .73 
638.94 
(612)510 83, 
(660)567176 



Table 3 provides mean reeponse times in the tapping and consonant 
classification tisks, respectively, with response times now measured relative 
to onset of acoustic energy for the consonant (that is, release for /b/ and 
/p/)- Table 3 provides comparable times for the vowel classifications of 
Qcperiment 2. As predicted, vowel and tap times pattern similarly. Uie 
correlation between them, computed over the eight dieyllables, is .95. 
Consonant times also pattern similarly to tap times (r - .79)* Itoreover, the 
jpatt^rna of vowel and consonant times are correlated (r - .73) • All of these 
correlations are significant. However, the significant relationship between 
tap times and consonant response times is due to shared variance between vowel 
and consonant times. When that variance ie partialed out, the correlation 
between tap times and consonant times falls to .46, a nonsignificant valua. 
In contrast, when variance shared by consonant- and vowel-identification times 
is partialed from the tap-vowel correlation, the partial correlation remains 
significant (r - .90). In a multiple regression analysis, only the vowel 
times contribute significantly to predictions of tap response times. This 
suggests that perceived timing of stressed syllables is a function only (or 
primarily) of perceived information pertaining to vowel identity as predicted, 
and is not significantly a function of perceived consonant identity. 

DISCUSSION OF EXPERIMENTS 1-3 

We have attempted to establish a relationship on the one hand between the 
temporal and articulatory structures of spoken syllables, and on the other 
hand between both of these systematic properties of produced speech and the 
perceived timing of syllables in productions that talkers intend to be 
rhythmical. We have proposed that measured vowel shortening in the context of 
surrounding consonants is An ind^: of coarticulatory overlap of the vowel by 
consonants. This proposal is supported by the coarticulation literature, 
which shows that vowels are coproduced with consonants (Barry 4 Kueneel, 1975; 
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Butcher a 1fei*er, 1976; Carney A Moll, 1971; Ohman, 1966) and provides 
evidence for vowel- to- vowel gestures of the tongue body occurring concurrently 
with medial consonant production. Based on our elaboration o^f Ohman' s 
proposal sugjsesting that vowels are produced continuously in sequences of 
stressed vowels, we hypothesised that the perceived timing of syllables is 
based on the perceived timing of vowels* \ 

Bie research presented here supports this view, showing that the per- 
ceived duration of^il vx>wel (Experiment l) and the tiroe necessary to identify a 
vowel (Ex per im en t 2) both are affected .by the identity of the syllable- initial 
consonant. In particular. Experiment 1 showed that the more extensive the 
shortening effect of a consonant on a vowel (and hence, by hypothesis, the 
more^ the consonant overlaps thrf vowel) the more the consonant helps resist 
shifts in perceived voicing of the syllable- final consonant, lAiich occur as 
the vowel's measured duration decreases. Experiment 2 found that the more 
extensive the shortening effect of a consonant on a vowel, the shorter the 
subject's response time to classify the vowel as /i/ or /a/ timed from the 
vowel* s measured onset. 

Experiment 5 established a relation between perception of the stressed 
vowel in a sequence of disyllables and the perceived timing of the sequence. 
Vowel classification times and tap times were highly correlated. 

Some problems with the present view of vowel production as continuous 
have been raised in a recent paper by Shaffer (1982). Shaffer points out that 
with changes in rate of production, vowels change in duration more than 
consonants. But if vowels and consonants were produced coordinately but 
separately as proposed here, either of two different outcomes would be 
expected. Just one segment type might be affected by rate change without any 
effect on the other; alternativ/ely, being coordinate, consonants and vowels 
might change proportionately. Neither outcome corresponds to what is ob- 
served. 

There is a way in irtiich separate, but coordinate segment types could 
change disproportionately, however. There is nonlinearity in the articulatory 
system in the form of an upper limit on segment shortening due to rate 
changes* If, at Slow rates of talking, consonants are closer to this limit 
than are vowels, then they would shorten less with an increase with rate than 
do vowels. Consonant gestures are faster than vocalic gestures at slow or 
conversational rates of talking. In a recent study, TUller, Harris, and Kelso' 
(1982) report a shorter duration of muscle activity supporting consonant than 
vowel production at a slow rate of talking. At a fast rate, duration of 
activity for the xonsonant and vowel is more similar, that for the consonant 
having decreased by 13$ and that for the vowel by 23$. 

Shaffer also argues that the present proposal "fails to account for the 
codrticulation of consonants and for coarticulatibn across , syllable 
boundaries; it does not consider the timing of postvocalic consonants or show 
why syllable duration is affected by the size o.f the consonant clusters" 
(p. 121). The present view does fail to account for the coarticulation of 
'consonants, but only because it does not yet address consonant production 
except in relatioja^ to vowel production. Consonants are considered primarily 
as they may at^fect perceived rhythm, or, more often, as they mask evidence of 
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vowel production used by listeners to guide rhythm judgments* However, 1 do 
not detect anything in principle that will prevent incorporation of infarma- 
tion about relative timing of consonant©, into a theory of vowel production ,a3 
separate from consonant production. The timing of postvocalic consonants 
relmtlve to the vowel, and coarticulation of consonants with vowels across 
syllable boundaries are addressed. 

As for increases in syllable duration witli increases in consonant cluster 
sise. the theory can offer two possible hypotheses. Segments have compression 
limits (e.g.,. Klatt, 1976). In particular, the constraint that consonants be 
initiated at a particular phase in the production of vowels' (Tuller et al . , 
1982) may prevent excessive overlap of the vowel by consonants in a cluster. 
If so, then production of a large cluster may force a discontinuity in vowel 
production with the consequence that initial consonants in a prevocalic 
cluster may not coarticulate with the following vowel but may with a preceding 
vowel; similarly, final consonants in a postvocalic cluster may not coarticu- 
late with the preceding vowel, but may with the subsequent one. However, in 
view of the findings that stressed vowels coarticulate over, long extents when 
unstressed vowels follow (Bell-Berti & Harris, 1979; Fowler, 1981a, 1981b), a 
different outcome is also possible. Consonant clusters may force an increase 
in the duration of a vowel cycle to preserve continuity of the vowel stream. 
Further research will have to distinguish these possibilities and to distin- 
guish them from others that might be proposed. 

PAST n^: CONTRIBUTIOHS FROM PHONETICS AND PHONOLOGY 

In this part of the paper, I will develop the three ideas outlined in the 
introduction. First J.a-sthe general idea that investigaU<m of language 
structure, which proceeds lArgely independently from studies ofManguage use, 
can provide a useful souF^e of evidence converging ( or fail ingXto -Converge) 
with results of experim^al studies. The second more specificldea is that 
some phonological rules are "natural" in the specific sense th<rr they reflect 
exaggerations and conventionaliaations of articulatory dispositions. Insofar 
as they can be identified as such, they offer a source of evidence concerning 
the nature and identity of some dispositions. Trtird, I provide examples that 
I suggest are exaggerations and conventionalizations of the articulatory 
tendency to produce vowels in a continuous, cyclic fashion. 

Phonological descriptions of languages characterize systematic properties 
in the phonological fonns of lexical items. That is, the descriptions factor 
systematic (general) phonological properties common to lexical items, ex- 
pressed as general rules, from properties idiosync^ratic to individual items. 
This factoring reveals a number of characteristics of the lexicons of 
languages that are relevant to psychological interests. Spoken language 
systems exist only as they are used by speaker/ hearers; moreover, they are 
evolutionary acquisitions of speaker/ hearers. In view of these facts, system- 
atic phonological properties provide clues to the nature of the 
speaker/hearers themselves (see, also Chomsky [l980j, who, however, focuses on 
their revealed cognitive nature, rather than on itheir perceptual and articula- 
tory natures as I will emphasize here). 

Some of these clues appear to be more fundamental or significant than 
others. They are systemai^c properties that are popular across languages. 
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Ibr example, many languages devolce final obstruents. In Gennan, the noun 
Bund iB pronounced /bunt/ In the nominative, but /bund/ In the genitive 
.Bundes. In Palish, "snow" is /s'n'ek/ in the nominative, but /s' n' ega/ in the 
genitive. In Russian, 'the nominative of "leg" is / noga/ but the genitive 
plural is /nok/. (The Geraan example is from Comrle, 1980, and the Polish and 
Russian examples from KenstowlscB 4 Klsseberth, 1979)- "B^at this phonplogical 
rule is somehov natural to language users is suggest^ by the fact that 
children learning language also have a tendency *to devotee final consonants. 
Biis occurs^even in Ehgllsh where it is inappropriate (Oiler, Wleman, Doyle, A 
Ross, 1976). 

Stystematlc phonological properties that are popular across languages may 
be popular ,for a reason. Indeed, there may be many reasons why a particular 
kind of systanatic property is favored by languages, but of interest here la 
the possibiiity that many properties are natural in resembling articulatory 
dispositions. Word-final devoiclng may be an example. 

If some phonological regularities do resemble articulatory dispositions, 
then phonological investigation can serve a useful function for psychological 
investigation of speech production. Articulation is difficult to study with 
respect to issues of psychological (as opposed, say, to physiological) 
interest,^ not simply because the artic ulators are difficult to access, but 
also because direct study of articulation tends to provide more detail than 
current psychological perspectives on speech-motor control can organize and 
explain. Identification of ^popular , systematic properties of the phonologies 
of languages can contribute to direct study of articulation in two ways. 
First, it can suggest the kinds of articulatory regularities that have served 
as resources for the evolution of phonologies. These suggestions can help to 
focus the search for regularities or organizing principles in articulation. 
Next, it can serve as converging evidence for hypothetical organizing princi- 
ple8--such as that of cyclic vowel production- -that may have emerged, perhaps 
dimly, from articulatory or perceptual investigations of speech.^ That is the 
use to lAiich phonological evidence id.ll be put here. 

« 

Systematic and Idiosyncratic Properties of Language 

Not all systematic properties of lexical items are factored out in 
phonological rule systons. Two kinds of systematic properties of lexical 
items can be Identified that I will call "conventional" an4 "necessary." 
Conventional systematic properties are expressed by general rule, while 
necessary ones are not. Conventional systematic properties are specific to 
individual languages; they are conventions, which are used to convey linguis- 
tic inforaation. An example is the formation of the -plural in Ehglish. The 
plural is foraed by adding (morphological) "s" td a word. Bie pronunciation 
of the "s" is conditioned in a ruleful way by properties of the phonological 
segment adjacent to which the "s" is appended. If the segment is unvoiced, 
and is neither a fricative or an affricate, the plural is realized as /s/ . If 
the segment is voiced and neither a fricative or affrJ.cate, the plural is / z/ . 
Otherwise th^ realisation is /Iz/ . This conditioning is ayatematic--it can 
be expressed as a rule--but it is a convention. An alveolar fricative after a 
voiced segment need not be voiced (witness "dance," phonemically /daens/) . 
And other languages have other plural foraation rules. , 
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Otbsir systematid properties of language are "necessary"; that is, they 
are essentially universal and ( to a first and close approximation) could not 
be other than they are. An'^example is the Pq contour of a vowal following a 
voiced or voiceless stop. Following release of a voiced stop, the fundamental 
frequency of the voice is low and gradually rises over a period of more than 
100 msec (e.g., Hombert, Ohala, 4 Bwan, 19^9; Ohala, 1978). After a^oiceleas 
^top, ?Q is high and gradually falls. Ihe reasons for this patterning are not 
fully understood, but it is gtnerally agreed that the Pq contour is a 
necessary consequonce of the aerodynamic and articulatory adjustments made to 
maintain or resist voicing during stop closure (Ohala, 1978). Oie Pq contour 
following a stop is a systeoa/tic property of a itord, but is not a convention 
and is not expressed as a phonological rule in the phonologies of languages. 

In the subsequent sections, I will focus on both necessary and conven- 
tional phonological properties. Hecessary systematic properties are direct 
sources of evidence about articulatory constraints on production. Por this 
reason, tffey are very useful to study. However, I will focus primarily on a 
secoai aspect of nece/sary properties- -they may serve as a sodrce of new 
linguistic conventions as languages change.. Ihus it will be important to look 
at the evolution of ^conventions to gain insight into necessary systematic 
properties. 

■ I ■ 

Leakages from Arti/ieulation into the Phonologies of Languages 

Ohala has Argued that exaggerated versions of necessary systematic 
Y properties of languages occasionally enter the language as conventions due, in 
his view, to systematic misperceptions by lis^;eners. For example, Ohala 
suggests (1974; 1981) that tone languages such as Punjabi may have evolved 
from atonal languages with voicing distinctions among atop consonants. 

This evidence derives from comparisons of related languages, one of which 
is:a tone language and tlje others of which are not. Punjabi, for example, is 
a tone language related to Hindi and other languages that are not. In 
Punjabi, the distinction between iaspirated voiced consonants and unaspirated 
unvoiced consonants, present in Hindi, is absent. Words starting with an 
aspirated voiced consonant in Hindi have a low tone on the vowel in Punjabi. 
In the history of Punjabi, apparently, the distinction between voiced aspirat- 
ed and unvoiced unaspirated consonants ma lost, leaving behind a tonal 
distinction between words formerly differing in voicing of the initial 
consonant. 



Ohala ascribes this sound change to consistent misperceptions by lis- 
teners. Hearing the ?q contours produced by voiced and voiceless consonants 
on following vowels, language learners may have interpreted the contours 
mistakenly as systematic conventions. Consequently, when these listeners 
produced voiced or voiceless stop- initial syllables, they intentionally pro- 
duced a tone on the following vowel. Being exaggerated, the contours were 



mure onjuit^nx. viiaii uno una.** w^w**-**^ ^- - 

accooipany stop voicing or voicelessnese. As nvmbers of language learners made 
the error (uncorrected for unexplained reasons) ,8 syllables differing in 
voicing of the initial consonant were marked in tuo ways--one by the voicing 
distinction itself and the other by the tonal pattern on the vowel. ^ In some 
languages, the tonal contours replaced the voicing difference as the critical 
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\ 

difference betiireen certain syllables. Ibese languages became tone 1 
Qhala (1981) offers many other examples where conventions apparently entered 
languages as ez^gerations of necessary systematic properties of speech. (See 
also Wright's [i1980] analysis along similar lines of the [continuing] vowel 
shift in Bigll8h|.) 

If the examples are real, they imply that some systematic conventions 
that are popular ai»eng unrelated languages may reflect exaggerations of 
necessary regularities in speech production and hence in fact may provide 
clues to the identity of some of these regularities. Review of the phonologi- 
cal literature reveals several systematic properties suggestive of the mode of 
vowel production proposed here to underlie ( in part) the impression of 
rhythmicity of speech. As we have characterieed (stressed) vowel production, 
it has two central aspects. Vowels' leading and trailing edges are overlaid 
by consonants, and vowels arq produced as a cyclic stream somewhat separate 
from the production of consonants. Reflections of both of these aspects can 
be found in the phonologies of languages. I know of no conventions that 
contradict tlfe proposed mode of vowel production. 

Language Conventions Suggestive of Continuous Vowel Production 

Vowel shortening and lengthening . A number of languages have adopted 
conventions whereby consonant and vowel length serve a distinctive function in 
the language. (That is, a long vowel, V:, or long consonant, C: , is 
considered a different vowel or consonant from its short counterpart.) In 
some of these languages, rules ensure that consonant and vowel length are 
complementary. These rules may constitute exaggerations and conventionaliza- 
tions ofnhe shortening effects of consonants on vowels depicted in Pigui*e 3. 

For example, Swedish distinguishes long and short versions of vowels and 
consonants phonologically. In Swedish, constraints on syllable structure 
prevent ^long postVocalic consonlnts and long vowels from qooccurring in a 
syllable and they prevent short vowels and (only) short postvocalic consonants 
from cooccurring in stressed syllables (Elert, 1964; cited in Lindblom 4 Rapp, 
1973). Allowed stressed syllable structures are (C)V:(C) and (C)VC:(c). 
(Parentheses indicate that segments are optional.)^ This reciprocal relation- 
ship between vowel and consonant length at the phonological level of descrip- 
tion of the language is not the same as the (genetic) shortening depicted in 
Figure 3. Lindblom et al. (1981) show that Swedish long .Vowels are shortened 
by intra- or' transsyll^jic consonants. Just as English vowels are. But the 
phonetic shortening oftn^ -long vowels does not transform them into phonologi- 
cally short vowel^ (Thus, although V: in V:C is shorter than V: in 
isolation, both are phonologically long vowels.) In Swedish, then, a recipro- 
cal relation exiets between consonants and vowels at two levels— at a phonetic 
level where i,t also occurs generally across languages, and at a phonological 
level idiere it is a convention special to Swedish. 

Yawelmani, a native American language once spoken in California, like 
Swedish, distinguishes phonologically long and short vowels.* Also like 
Swedish, Yawelmani maintains a reciprocal relation between vowel length andj 
in this case, the number of following consonants. In Yawelmani, a phonologi- 
cally long vowel in a stem is made short if a suffix is added to the stem 
causing the stem vowel to be followed by more than one consonant* 
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According to Kenstowiscg and Kisseberth (1979):. "Examination .of a 
variety (rf other languagea reveals that alternations ir\^ [ phonological] vowel 
length typically revolve around differences in the consonant- vowel structure 

of irords, with long vowels preferred in "open syllables" ( ^CV) and short • 

vowels preferred in "closed syllables" { CC)" (p. 83). This is just what we 

would expect if languages tend to conventionalise, by exaggeration, properties 
of production that already are necessarily systematic in language. By virtue 
of the coproduction of vowels and consonants in syllables, vowels are overlaid 
by consonants, leading to their measured shortening. In many languages, vowel 
length is made phonologically distinctive and, in some of these languages 
(Swedish, Yawelmani , and others), rules conventionalise the reciprocal rela- 
tion between vowel duration and consonant duration. 

Historical sound change . Some historical sound changes reflect a similar 
reciprocal relation between vowel length and the vowel' s consonantal context. 
These changes are called "compensatory lengthening" (e.g.,^ngria, 1980) and 
are occasions where a consonant is lost in a wofd or set of words and a vowel 
in the vicinity of the consonant, formerly phonologically short, becomes long. 
This occurred both in Latin and Greek. Both languages lost / s/ in certain 
contexts. In Latin, /sisdo:/ became /si:do:/, for example, and in Greek, 
/ekrinsa/ became /ekri:na/ \(lngria, 1980). Phonetically, loss of a consonant 
should "uncover" part of a vowel's produced extent-'giving it a longer measured 
duration. The historical change appears analogous except that the lengthening 
of the vowel is phonological . (However, see deChene 4 Anderson, 1979f for a 
skeptical look at the historical phenomenon of compensatory lengthening.) 

Vowel infixing and vowel harmony . Languages reveal two other conven- 
tional structures suggestive 67 the basic organization of consonants and 
vowels that we have suggested. In contrast to the conventions Just described, 
which reflect (so I suppose) the overlap of consonants and vowels in 
production, the following conventions may reflect the separateneas of the 
vonel "stream" from the production of consonants. In particular, they are 
conventions in which phonetically nonadjacent vowels are treated in some 
respects as if they were adjacent (and hence a separable stream from the 
consonants) . 

In Arabic (McCarthy, 1981), derivationally related words may share a 
triconsonantal root. For example, words in which "ktb" occurs all have to do 
with the concept "to write." Examples of words are /katab/, /ktaabab/ , 
/kutib/, /uktabY. McCarthy does an analysis of these word systems in which 
separate vocalic and consonantal tiers are proposed to underlie word genera- 
tion, o 

To generate a particular verb form in Arabic, three choices are made. 
The choice of the triconsonantal root detennines the word- family. The choice 
of a "prosodic template" selects the derivational form of the verb. Finally, 
selection of a vocalic infix determines the voice and aspect of the verb. 

The prosodic template is a word schema that specifies the numbers and 
orderings of the consonants and vowels in the word (e.g., CWCVC). Some 
templates have more vowel slots than vowels in the infix and more consonant 
slots than consonants in the root. In general, consonants in the root are 
assigned left- to- right to the C slots and vowels in the infix left- to- right to 
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the V aloti of the template. If there are unfilled C or V slots, the right- 
soat conaonant or vowel is "spread'' to the unfilled slots of the appropriate 
type. So, for example, /ktb/ and the infix /a/ (perfective, active), inserted 
into the taaplate CVCVC give /ka tab/ ( "write") ; inserted into CCVCVC gives 
"ktabab." 

McOtrthy has captured this system's structure using a ^-called "autoseg- 
mental" analysis (Goldaoith, 1976). An auto segmental approach differs from 
the usual segmental/ suprasegmental approach in allowing several segmental 
tiers to underlie tbe expressioh of an utterance. Traditionally one or two 
are allowed: one for phonological segments, and, perhaps, another for tonal 
contours and other aspects' of prosody. However, according to Ooldanlth, 
utterances cannot be slic^ vertically (perpendicular to the time axis) in 
such a way that the" utterance is partitioned into coherent units. Instead, 
different features of the utterances start and stop at -their own individually 
appropriate Intervals and to a degree independently of the startings and 
stoppings of other features. In an auto segmental- formulation, properties 
regulated separately are assigned to different ti<?rs of a structure represent- 
ing the utterance. Ihe different tiers are related by simple rules of 
aaaociation. ^ 

In McCarthy' & analysis, vowels and consonants are assigned to separate 
tiers. So, for example, /ha tab/ is represented by the structure in Figure 8a 
and /ktabab/ by that in Figure 8b. In this Jd.nd of fomulatlon, the 
"apreading" to unfilled consonant \or vowel slots now can literally be a 
apreading. Fbr /a/, there are no relevant segments (see discussion below of 
tha Relevancy Condition) intervening between two V slots. 

Ihis autosegmental structure, proposed by McCarthy, obviously is compati- 
ble with the articulatory dynamics proposed to underlie syllable production. 
It differs from the structure, however, in being a convention of Semitic 
languages, not a necessary property of syllable production. Nonetheless, its 
existence suggests that of an underlying necessary property of production not 
unlike the one proposed in PEirt I. 

Another, more frequent, language convention possibly reflecting the same 
articulatory structure is "vowel harmony"-- that is, a tendency for certain 
vowels to aseimilate to other vowels in their neighborhood. Vowel hannorty 
occurs in many languages, including Turkish, rt^ngarian, Yawelmani , and Igbo. 
• In Turkish, for example, properties of a suffix vowel are assimiliatod in 
backness and round ing> to the stem vowel to which it is attached. Rules* of 
vowel harmony operate over any nunber of intervening consonants. Thus, vowel 
hamony, like vowel infixing, is captured naturally in an autosegmental 
analysis in which vowels and consonants occupy separate tiers. 

Vowel hamony may be an instance of a class of rules tending to conform 
to a constraint on phonological rules knowi a^ the j^* Relevancy Condition" 
(Jensen, 1974; Jensen 4 Stong-Jensen, 1979010 The coifstraint specifies the 
conditions under which phonological rules cah refer to influences of segments 
an nonadjaceftt segments (^"action at a distance"). 

Phonologieal rules may be characterieed as having the following abstract 

form: 

32 



a) . C V ^: V C 



t 




b) c V c V c 



k t 




Figure 8. Vowel infixing in Arabic from McCarthy (1981). 
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fqeuB structiaral change/ deteminant, irrelevant segments, . 



r example, a rule of vowel hamony in Yavelmani can be Written as follows: 



[♦ sylll p round"] /F* sylll , 
^ high h / h '^^^ 

J I: J/ L* high J 



/ 

lir words, a yovel (focus) is realiied as rounded, back and nonlow (structural 
change) following a rounded vowel matching it in height (deteminant) and by 
any nuttber- of intervening consonants (irrelevant segmen'ts) . jlccording to the 
Belevancy Condition, any features shared by the focus ajad the deteminant 
(here, any vowel) define a class ©^"relevant segments." The complement of 
that class, the irrelevant segments, serves as the "distance" over which a 
phonological segment exe*t its effect. Bie infl.i^ence cannot skip over 

relevant segments. Hence in ^ the Yawelmani hamony rule, the ^irrelevant 
segments skipped over are all and exclusively consonants. 

Conceivably, the relevancy conditions of a language may be useful in 
defining its auto segmental tiers. ' The relevant segments defined by a rule may 
define segments that share a tier and irrelevant segments define a different 
tier or tiers. If so, it is interesting that in the examples of rules 
confpming to the constraint provided, by Jensen and Stong-Jensen (1979), 
relevant .segments are either consonants only or vowe^only, n^Ver both. 

CONCLUSIONS 

Talkers 

When talkgrs produce seqi^nces o^^stressed vowels and consonants, produc- 
tion of the ti»-*^gment types overlaps. Biis is shoi^h by coarticulatory 
evidence, evidence of measured shortening of vowels^ in consonantal contexts 
and, by inference-, by the exre-benee of phonological rules in some languages 
that ensure a complementary relation between consonant and vowel length. 

In addition, evidence suggests a degree of separateness of vowel from 
consonant production, idiich in fapt allows ^.the overlap just described. 
Evidence for the separation of vowel from consonant production is threefold. 
Coarticulation suggests it, the patterning of speech errors suggests it, and 
so , inferential ly, does phe existence of phonological rules in which an 
autosegmental analysis distinguishes a vocalic ftom a consonantal tier. 

When talkers intend to produce a rhythmic sequence of stressed monosyll- 
vables, evidence suggests they produce evenly timed vowels. Timing of syllable- 
initial consonants depends on the ways in idiich consonants ot clusters are ^ 
produced relative to vowels. A relaxed cyclicity in production of stressed 
vowels in natural speech may explain in part the impression of temporal rhythm 
in stress- and syllable- timed languages. | 

As to why talkers might produce speech in ^this way, only tentative 

answers may be given. labemcui and Studdert-Kennedy ( 1 978) suggest that 

speech is coarticulated ("encoded") for the listener's sake. Speech has to *be 

produced at a rapid rate to enable retention of sufficient speech for 
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syntactic analysis. But at the required rate, were speech a sequence of 
discrete sounds, listeners would be unable to recover the segments or their 
order (see, e.g., Warren, 1976). Coarticulation^ allows a large number of 
relatively long soilnds to occupy the same interval as a much smaller number of 
shorter, but teoporally discrete, segments. We have shown here that listeners 
make use of information for a vowel during the portion of the signal dominated 
by consonant^ information. Biis is entailed by the proposal of Liberman ahd 
Studdert-Kennedy that coarticulation facilitates the perceptibility of serial- 
ly-ordered speech sequences (see also, Shankweiler, Strange, & Verb^rugge, 
1977). 

A second reason for qieparate vowel and consonant production may have to 
do with production rather than perception. ELseidiere (Powler, 1977rFowler, 
Rubin, BBme'jB, 4 IXirvey, 1980) I have proposed that talkers may exploit the 
fact that vowels constitute a natural articulatory class. All vowels, in 
contrast to consonants^ are lyoduced as relatively slow changes in the global 
shape of the vocal tract effected largely by movements of the tongue body and 
jaw. ^ 

Ehch particular vowel itself is a class of tongue body and jaw positions 
that yield approximately the same global vocal tract shape. This is shown by 
perturbation studies irtxere, for example^, talkers produce vowels clenching a 
bite block between the teeth so that the jaw is fixed. In these studies, the 
acoustic properties of the 'vowels are near-normal (e.g., Powler & Turvey,;; 
1980; Lindblom, Lubker, & Cay, 1979) suggesting that tongue movement has 
compensated for the inability of the jaw to move. It is shown, too, by 
studies of coarticulation where positioning of the jaw in CV and VC syllables 
is affected jointly by- the identity of the consonant and vowel (Sussman et 
al., 1973). These observations are displayed schematically in Figure 9.: In 
the figure, each vowel is represented as a curve in a jaw-tongue coordinate 
space. This is meant to show the capacity that ^ speaker has to achieve any 
given vowel by a class of jaw positionings and tongue positionings relative to 
the jaw. Due to this capacity, when a bite Wock^ prevents jaw movement, or 
idien a consonant perturbs it, all is not lost; an accaptable version of the 
vowel is achieved by adjusting the tongue to the apecial constraints on jaw 
position. 11 

Vowels differ one from the other largely (but not entirely) in terns of 
the tongue-body's positioning (front/back, high/low) relative to the palate. 
The idea that vowels constitute a natural articulatory class is indicated in 
Figure 9 by showing /i/, /£/, and /«£/ as if the functions for each vowel 
relating jaw position to the position of the tongue relative to the jaw vere 
parallel. ^y hypothesis, producing a^ vowel , any vowel , involves organizing 
the musculature of the jaw and topgue body so that th^ two structures work in 
a compensatory fashionr Producing a particular vowel may be modeled as 
choosing a parameter value for the jaw-tongue relationship that enstires an 
"equilibrium position" for the jaw-tongue system appropriate to the selected 
vowel . 

This proposal is analogous to BizziVs (1978) hypothesis that pointing to 
positions by monkeys is achieved when the montey establishes appropriate 
levels of activation 'of agonist and antagonist muscles in the am. 
Appropriate activation levels create an equilibrium position of the arm (that 




ure 9. Schematic representation of constraints on the jaw and tongue 
during production of vonels /i/, /£/ and/ /a^^ and on the jaw and 
lipa during bilabial consonant production^ A vowel is produced, by 
a range of n^atively correlated ^aw and tongue positionings that 
yield the same tongue- palate approximation. Similarly, a bilabial 
atop ia realised by a variety of negatively correlated jaw and lip 
poaitioninga that achieve bilabial closure (e.g., Fblkina & Abbs, 
1975). ^ 
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is, ths position of the aim when the opposing muscle forces balance) at the 
target position. 

Vhat wuld such a system buy a talker? Firsts establishing a compensate* 
ry relationship betmen Jaw and tongue may constitute an example of a general 
my in idxich movement sys tarns responsible for reproducing positions (as 
opposed to movements) tend to be organised. The organisations have the 
advantage of *'equi finality** --that is, of enabling achievement of the goal 
position in a variety of nays without requiring reorganization (seet e.g., 
Kaele, 1960; Kelso A Holt, 1980). This makes vowel production context- 
sensitive. 

Secoxkl, the Aspects of vowel organization that hypo the tic ally are shared 
among vowels may/ buy the talker an increment in efficiency in facilitating 
cyclic vowel proauction. Qyclic activities such as locomotion and respiration 
(see Grillnert 1977) are efficient in terns of the motor organizations they 
require. In Xocomotion, muscle systems are organized to generate a step. 
Once 80 organised t the same muscle systems will produce an indefinite nunber 
of subsequent steps without requiring any change in organization. Cyclic 
vowel production may provide another example of this kind of motor organiza- 
tion. If it is possible for a talker to cbordinate his or her tongue and Jaw 
in a compensatory fashion but also in a way that is general to the class of 
vowels, then once established, the organization can serve the production of 
vowels throughout an utterance, individual vowels being produced by cylio^ 
reparameterizations of the tongue- Jaw system* 

Of course, this proposal currently b^s a nunber of critical questions: 
Host importantly, how might the muscles of the Jaw and tongue be coordinated 
in a compensatory fashion? Second, is the notion of a difference in values of 
parameters of an invariant organisation of muscles a realistic way to describe 
the different Jaw-tongue relations characteristic of different vowels? 

However, if vowel production were cyclic, it would help to rationalize 
the linguist's and naive listener' e Judgments of rhythm in speech. ' Indeed, 
this is our tentative proposal, based on studies of mohosyllablic stress feet, 
and subject to revision when we turn to more natural productions (Fowler, Note 

1). . 

Listeners ^ 

!Die most important conclusion to be drawn about listeners' perception of 
rhythmic speech is that it mirrors the natural structure of the spoken 
utterance. Listeners hear speech sequences largely as talkers produce them 
and essentially as talkers intend them to be heard. 

Doing so involves hearing through coarticulatory overlap of segments, and 
we have shown at least one circumstance in i^ich listeners appear to do Just 
that (Bxperimeiil; 2). We have proposed that their hearing through coarticula- 
tibn is analogous to their perceptual segmentation of visually complex events 
and involves something like a perceptual vector analysis of the acoustic 
speech stream* 
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int#rprtt«tionp listtwrs hmmr iiochronous speech i*en tajIperB produce 
it bj attendlQg to acoustic Information epecifylDg tlalng of (stressed) voml 
prodtiotlon. In the Isochronous sequences of stressed ffionosyllables, talkers 
product vowls cycllcallj and listeners attend to the tiiiilcig of vowels. 

Measuremnt 

As ue have suggested elsenhere (Ibwler 4 Tassinary, 1981), conventional 
■easureaents of phonological segnents and aeasures of acoustic segments do not 
alvays reflect the psychological structure of the spoken or perceived utter- 
ance. This is not because (or only because) listeners "Interpret" the 
acoustic message iftlle aeasuremtots are "objective" assessments. Rather, 
there are other possible objective segmentations of a signal than conventional 
ones, and the listener' a perspective on the signal may constitute an alterna- 
tive objective segmentation. In particular, conventions for measurement in 
i^ch phonological segments are deoarcated as if they were temporally discrete 
do not reflect the possibly equally objective perspectives that respect 
coartlculatory overlap. The Judgments of listeners may in the future guide 
decisions concerning natural measurement criteria for speech. 

Sources of Evidence 

Products of linguistic analysis offer a reservoir of evidence, largely 
untapped by psychologists, that can converge with evidence obtained ftom 
eatperlmental investigation. Although the procedures of phonological analysis 
are nonexperlmental, the products of the analysis, systematic phonological ^ 
properties of languages, are behavioral systematicities because they reflect 
language use. As such, they are relevant to psychological theories of 
language use including theories of speech production and perception. 

Here we have used evidence from phonological anaiLysis of language to 
buttress proposals that the talker's overlap of vowels and consonants is 
perceptually real and that separate, perhaps cyclic, vowel production is 
sufficiently real for language users that it gives rise to analogous phonolog- 
ical phenomena. 
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FOOTNOTES 

1a foot is a unit of metrical structure in speech consisting of a strong 
syllable and one or porSe weak syllables. In Ehglish, the weak syllables of a 
foot always follow the [Urong syllable. A mora is a "light" syllable (that 
is, a short vowel optionally preceded by a consonant) or it is part of a 
"heavy^ syllable; a heaVy symable consists of a syllable- initial consonant, 
if any, a long vowel oria short vowel and a poet-vocalic consonant, and is two 
morae in length. | 

^he data in Pigui^e 5b were collected from a single talker (the author) 
idio produced CVC syllables in a carrier phrase. 

5purther evldencW in support of the view that vowel and consonant 
production are separate is av.ailable in the literature on speech errors. 
Anticjlpation errors, perseverations, exchanges, and substitutions never in- 
volve interaction between consonants and vowels. Instead, vowels intrude on 
other vowels and consonants on consonants. 
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^This aay be an oversiapllficatlon In two senses. First, vowels shorten 
for some reasons bavdng nothing to do with coarticulation-~for example, when 
speech rate increased Therefore, whereas coarticulation Implies shortening , 
the reverse need n&f be true. Second » stressed vowels coarticulate with 
consonants and with unstressed yowels^ that precede or follow them (e.g., 
Ibwler, 19d1a, 1981b; and see Bxpefijaent 2 below). To coarticulate with an 
unstressed vowel, stressed vowel production necessarily extends throughout 
(and beyond) production of a medial consonant at least in utterances where an 
unstressed vowel ^precedes the stressed vowel. ' But the vowel's measured 
shortening is less than the full extent of its overlap by other segments 
(again, at least in utterances including unstressed vowels). Possibly, the 
effective duration of a stressed vowel for a listener doee not include the 
entire period of time during which it influences the acoustic signal. 

^his experiment was carried out in collaboration with Louis Tassinary 
and has been eummari&ed in Fbwler am Tassinary (1981). 

^e attempted to create continua using the syllables of the third talker 
In the metronome study. However, we were not successful in creating continua 
of syllables that listeners could label consistently. 

^This prediction requires clarification. Ihe observation that vowel 
shortening in /pV/ is greater than in other syllables is true if vowel oneet 
is defined as the onset of voicing folloifing release of a syllable- final 
consonant. If the oneet were located instead at the onset of the formant 
transitions following releaee of the /p/--an equally defensible location 
because the transitions provide vowel infonnation as well as being sufficient 
to specify the /p/ to a listener — the rank ordering would change. However, it 
is not neceseary for the aims of the present experimente to be met to defend 
either of these measuring points as superior. Indeed, according to the 
present arginents, any measuring point is indefensible that purports to divide 
an acoustic signal into ndnoverlapping phohetic segments. The aims of the 
experiments can be met if a reference point ie selected and used consistently 
in aesessing syllable timed productions (Figure 2), judgments of vowel 
duration (Experiment l), vowel and consonant claeeificatibn (Experiments 2 and 
3) and syllable- timing judgments (Experiment 3)- If syllables are aligned 
eimilarly around the selected reference point for syllable- timed productions 
and Judgments as for assessments of vowel durations and for vowel classifica- 
tions, but not for consonant claesifications, then the conclueion is warranted 
that syllable timing is related to vowel sequencing more than to consonant 
sequencing . 

Sliouis Goldstein (personal coramuiiication) has- suggested a reason for 
this. Locke's research (e.g., 1979) on the so-called " fis" phenomenon in 
children reveals that, immediately after producing a word, children are more 
aware of what they meant to say than of what they in fact uttered. Locke* s 
research focuses on children whose speech doee not seem to dietinguish pairs 
of sounds (e.g., /w/-/l/ or /r/-/w/) that are distinct in adujt language. 
After having produced something like /weyk/ meaning "rake," they will deny 
having said '"wake." But if their production is recorded and replayed to them 
one day later, they are no better than other lietenere in distinguishing their 
"wakes" from their "rakes." 
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9i aM grateful to Judy Kagl for pointing out the relevance of McCarthy's- 
analjaia to^oiy propoaal that vovel production la contlnucua. 

thank Alan Ball for directing me to the vork of the Jenaena. 

^Vln Figure 9t I have drawn the curvea for each vowel aa If Xhey were 
atraight liaea, and the llnaa for different voweLa as if they we^P parallel. 
There la no reaaon to auppoae that either conatralnt la accurate. The llnea 
are meant to serve aa schematic repreaen tat Ions. 




SCMl DlffEBEiCES BEWEBM HIOHBTIC AHD AUDITORY MOl)BS OP PERCEPT IOH» 

Virginia A. Mann^ and Alvin N. Uberman^^) 

/ / 

f 

Abstract. When third- fomant traneiitions are appropriately incbrpp- 
rated Into an acoustic syllable they provide critical support for 
the phonetic percepts we call [d] and.[g], but nhen presented in 
Isolation they are perceived as time-varying 'chlrpS^' In the 
present ezperlAent, both modes of perception vere made available 
simultaneously by presenting the thlrd-foraant transitions to one 
ear and the remainder of the 'acoustic Syllable to the other* On the 
speech side of this duplex percept, where the transitions supported - 
the perception of stOfUvdnel syllables, perception was categori(\al 
and Influenced by th0 ^esence of a proposed [al] or [ar]. On the 
nonspeech side, where the same transitions were heard as 'chirps,* 
perception was continuous and free of influence from the proposed 
* syllables. As both differenceef occurred under conditions in irtiich 
the acoustic input was constant, we should suppose that they reflect 
the different properties of auditory and phonetic modes of percep- 
tion, 'l 

In the phonetic domain, the relation between acoustic cue and percept has 
several characteristics that have been taken to imply a special mode of 
processing (for recent reviews, see: LLberman, 1962; Liberman 4 Studdert- 
Kennedy, 1978; Repp, 1982; Studdert-Kennedy, I960; but s4e, for example: 
Kiihl, 1981; Kuhl a Miller, 1975; Miller, 1977). One such characteristic is 
that frequency-modulated acoustic cues are integrated with other cues into^ 
unitary percepts that seemingly lack the qualities we might /have been led, on 
purely psychoacoustic grounds, to expect. A case in point, and the one with 
which we will be concerned, is in the perception of the stop consonants [dj 
and [g]. As has long been known, sufficient cues for the perceived distinc- 
tion between these phones are transitions — that is, frequency modulations--of 
the second or third fotoants. Diua, when appropriate tranMtions of the third 
foraant— the cue that will be the subject of our investigation— are presented 
in an otherwise fixed acoustic context, listeners perceive a syllable consist- 
ing of [d] or [g], followed by a vowel. Of special interest to us is that one 
bears in these percepts none of the time-varying quality—a ' chirpiness , ' for 
example, or a glissando— that might be thought to correspond to the time- 
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varyliig nature of the fraquency-Bodulated signal. Indeed, one finds it 
difficult to charactarlae the [d] and [g] percepts, and especially the 
differmces betveen thes, in auditory tenas of any kind. It is. as if the 
percepts vera as abstract as the phonetic segnents «4hey represent. / 

Ve Bight nevertheless accotmt for the percepts! without reference to 
specialised processes of a phonetic sort* Ihus ve might assume, most simply, 
a low-level process of sensory integration, similar, perhaps, to the integra- 
tion of intensity and time into the perception^ of loudness. But su9h an 
assuBptlon la ruled out by the finding that listenersj do, in fact, hear the 
to- be- eoc pec ted chirpe and glissandi when the translti<in cues are removed from 
the larger context and sounded alone (tfettingly, Ilberman, Sljrrdal , A Halwes, 
1971). Still, we night save an auditory account by iioting that the transi- 
tions are nonially presented in a larger acoustic context, and that they are, 
therefore, subject to the effects of a purely auditojry interaction with the 
remainder of the pattern. On that account, the peculiarly abstract character 
of the percept would be thought to emerge from the interaction. Nothing wo 
know about auditory perception suggests the existence of suc'h an interaction, 
but the possibility is not precluded. ^ | 

' I 

Ibere is, in any case, another characteristljc of the way formant 
transitions function when they cue stoi^^^HiaiJsonants : the phonetic percepts 
they support are appropriate to their role in langtiage, not only in their 
abstractness, but also in the extent to which theyj are categorical . Given 
transitions that qhange in relatively small physical jstepe, from one appropri- 
ate for [d] to one appropriate for [gj, the i^rcept changes, not in 
correspondingly small steps, but suddenly (Libenian, Harris, Hofftiiah, A 
Griffith, 1957; Mattingly et al., 1971; Repp, in/ press; Studdert-Kehnedy , 
liberman, Harris, A Cooper, 1970). This nearly pategorical shift marks a 
sharp boundary between the two phones [d] and [g]; it is commonly reflected 
and measured as a relative increase in disc rim inability of the stimuli at the 
category boundary. But such tendencies toward categorical^ perception do occur 
in nonspeech perception as well (see, for example: Burns A Ward, 1978; Locke 
A Keller, 1973; Miller, Wier, F^store, Kelly, A Dooling, 1976; Parka, Wall, A 
Bhatian, 1%9; Siegel A Siegel, 1977)f so the question is not whether it im 
unique to the perception of stop coneonants (and other phonetic segments), 
but, more properly, whether the categorical boundary between the phondtic 
segments ie of an auditory sort. We have reason to believe it is not, for 
when the-Bame foment transitions are presented in isolation (and ))erceived as 
nonspeech chirpe), the obtained discrimination function is continuous — that 
is, it does not display the abrupt peaks and troughs that typify categorical 
perception. This result has been obtained in adults (Mattingly et al., 1971) 
and in infhnts (Bimas, 1974). It follows, then, that if the categorical 
effect in the full speech context is to be assigned a purely auditory caufie, 
then, as in the previously noted case, it must be referred , ad hoc , to some 
aesumed auditory interaction between the transitions and the remainder of the 
acoustic pattern. 

A quite different characteristic of the way fonaant transitions cue [d] 
and [g] is that their effects are subject to the influences of phonetic 
context. Ihus, given abutting vowels, the transition .must , of course, move 
into or out of the vocalic nucleus; hence, the boundary between [d] and [g] 
will occur in transitions that are at different positions on the apectriaa for 
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difftrtnt vocalic contexts (Delattre, Uberman, 4 Cooper, 1955; Liberman, 
Belattre, Cboper, 4 Gerstman, 1954) • More relevant to Qjar concern here, 
honever, ia the fact that, given a fixed continuum pf~ foraant transitions, a 
shift in the [d-g] bomdary. can be produced by neighboring consonants. Sich 
effects have been found with preposed fricatives (Mann 4 Hepp, 1981; Repp 4 
liann, 1981) and across a syllable boundary with preposed [alj or [arj (Mann, 
1980). In bath cases, the shift in the position of the boundary nas found to 
be consisteifT with the way the foraant transitions for [d] and [g] are 
affected in normal speech by coarticulation with fricatives or with liquids. 
Therefofft, "^the. movaaent of the category boundary is most plausibly to be 
V understood as a perceptual compensation for the effects ^ coarticulation. As 
such, it would presumably reflect a phonetic rather than an auditory process. 
Tbx^pj^al, instead, to an auditory interaction would require not oidLy that we 
set aside the co^rticulatbry facts, together with the reasonable interpreta- 
tion based on them, but also that we make a seemingly unreasonable assumption 
about why speech percept^ion finds parallels in speech production-- to wit, that 
speakers adjust the behavior of their articulate ry organs so as to produce in 
every context just those acoustic effects -th^t will fit boundary shifts caused 
by pre-existing auditory interac tiojip. Such an interpretation becciaes, in the-^ 
end, hopelessly ad hoc and, given lAiat we'kaow of constraints on articulation, 
quite implausible. But, again, it cannot, in principle, be"*ruled out. 

f To control for auditory interaction, we should contrive acoustic patterns 

that can, depending on specifiable circumstances, be perceived either as 
speech or as jaonspeech. Two techniques are available for this purpose, and 
both have been used in other studies to gain the control we seek. One anploys 
stripped-down versions'^ of synthetic speech that can be heard as speech or 
nonepeech, dependiiig 0n the natural proclivities of the listeners, how long 
they have been listening, and just what has or has not been siiggested to them 
(Best, Morrongiello, 4j Eobson, 1981.; Remez, Rubin, Pisoni, 4 Carrell, 1981). 
The other method, and ^he one we will use, takes advantage of a phenomenon in 
which, with auditory input held cdhstant, the acoixstic cue of interest is 
/"perceived' simultaneously as a nonspeech chirp and as critical support for a 
' phonetic segment. Diis phenomenon, called 'duplex perception,' was first 
reported by Rand (1974)^ Recently, it has been further studied in an 
\ investigation of the cues for the liquids [ l] and [ r] (isenberg 4 Liberman, 
\ 1978), and it has been used to control for auditory interaction in a study of 
silence as a cue for stop consonants (LibermaA, Isenberg, 4 Rakerd, 1981). 
/ Here, we will exploit it to provide an .appropriate control for auditory 
( interaction in investigations of the tMrrd-formant transition as a cue for the 
^perceived distinction between [d] and [g]. In the' first of the3e,^we will be 
coite^rned to find out idiether the integration of such transitions into unitary 
phonemic categories is to Ife attributed to processes of a generally auditory 

sort, "o^ whether it ' is the result of processes that are distinctively 

phonetic. Die si^cond part of our, study is designed to determine if context- 
conditioned mpveflfent of the boundary betwken the [ d] and [ gj categories is 
also to be regajpded as a special attribute of phonetic perception. 

* EXP ERIMENT I 

Our aim in the first experiment was to measure discriminability of third- 
formant transitions on both sides of a duplex percept—that is, when, on the 
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*speech* aide, thB tranaition^ provide crucial support for the iperceived 
difference between [da] and [ga], and idien, on the 'nonspeech' aidei they are 
heard aa unapeecMike^ 'chirpa.' The atimulua patterns vere^^^ three- foment 
aynthatic syllables in iriiich the third foxmant varied in nine steps, from a 
setting appropriate for [da] to one appropriate for [ga]. 

' . — . ♦ 

To prodiK^e duplet^ perception of these third-foraant tranaitions, we 
separated them from the (fixed) remainder of the pattern — ^which we will, for 
convenience, call the "base** — and presented the separated constituents dichot- 
ically. Thus, the trai^sitions, which in isolation sound like chj|rps, and the 
baae, vhich in isolation sounds like a syllable (most commonly, [da]), are 
free to mix and hence -to interact in the listener' s nervous system. The usiial 
result is two percepts, present simultaneously. On one side of this duplex! ty 
la a syllable, [daj or [ga], which is perceptibly different from the base but' 
very similar, perhaps identical, to what is heard when the twa consti-tuents 
(transition and base) are mixed electronically and presented in the normal 
manner (Libeman et al., 1961; Repp, Mllburn, ft Ashkenas, 1982). On the other 
aide is a nonapeech 'chirp' that seems identical to what is heard when the 
transition is presented in isolation. 

Given systematic variation in the^onnant transitions, we can measure 
discrlminabllity, hence tendencies tow^^^ categoricalness, of the resulting 
speech and nonapeech components of the duplex percept. To the* extent that 
there is categorical discrimination of the fonnant transitions heard qn the 
i speech side of the duplex percept, the discriminatioh function should have 
marked peaks and troughs that accord with predictions derived from phonetic 
labeling- responses (Liberman et al., 1937X; To the extent that the phonetic 
categories themselves have a purely aud^rEory basis, the discrimination func- 
tion for the same foment transitions when heard on the* nonspeech side of the 
duplex percept should also have marked peaks and troughs and> like the 
function for tiiscrimiiiatlon of speech percepts, should meet with predictions 
derived from phQnetic labeling. 

♦ 

METHOD 

Materials m *^ 

StimAus continuum . At the top of Figure 1 is a schematic representation 
of the stimulus patterns. Th^se patterns, i^^ry similar to those used by Mann 
(1980) in the study referred to in the Intr^uction , were designed to be 
synthetic approximations to the syllableflS [da J and Iga]. They were produced 
on th6 parallel resonance, syn the i^izer at Kaskins Laboratories. The lower half 
of Figure 1 shows how the stimuli Wbre divided into the two con8tituents--the 
fixed ''base' and the variable 'isolated transitions' — that will, when present- 
ed dlchotically, produce the duplex percept. The base is 250 msec in total 
duration, with a SO-msec ramp in overall intensity at onset and offset, and a 
fundamental ft-equency that falls linearly from 110 to 80 Hz. The first- and 
aecond-fomant transitions are 30 msec in duration and step-wise linear in 3" 
msec steps; they begin at 279 and 1764 Hz, arriving finally at steady-state 
values of .765 and 1230 Hz, with bandwidths of 60 and 80 Hz, respectively. The 
third foment of the base begins 50 msec later than the others and maintains a 
steady state at 25.27 Hz with a bandwidth of 120 Hz. In accordance with 
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Formant 3 




Formant 2 
Formant 1 



[dal to [gal 

NORMAL (BINAURAL) PRESENTATION 




base 
(to one ear) 




Stimulus 1 
Stimulus 2 



isolated transitions 
(to other^ear) 



DUPLEX-PRODUCING (DICHOTIC) PRESENTATION 



figure 1. Schematic representation of the patterns used to produce the duplex 
percepts, including the constant base portion and the continuum of 
nine formant transitions. 
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natural spatch, thla third fbxnant ia alightly leaa intenae than the other 
tuo. 

lha continuum of nine foment tranaitiona naa ayntheaiiBed aeparately from 
the baae* Ebch tranaition ia ^ maec in duration and step-wiae linear in 
Mac atepa; fundmental frequency and amj^itude contour are as in the first 50 
naec of the base atimuluar the offset frequency is the steady-state third- 
formant frequency of the baae, and the band nidth is 120 H». Qaset frequency 
ayateiiatically varies across the continuum in eight equal stepe, from 3196 Hz 
in Stimulus 1 to 1855 Ha in Stimulus 9- As can be seen in the figure, the 
first four tranaitiona have falling elopes, the fifth is flat, and trie final 
four are riaing. The alopea of the four rising transitions are equal in value 
to the alopea of the trMaitiona that fall. For convenience, we will refer to 
the transitions hereafter by number, as shown in the figure, from most falling 
to moat riaing. 

Teat tapes . The base stimulus and the continuum of transitions were 
digitiaed at 10,000 Ha prior to being recorded onto magnetic tape for the 
purpose of teating. As was^ appropriate for dichotic presentation (and duplex 
perception), the baae waa recorded onto one track, the isolated transitions 
onto the other. 

A (duplex perception) labeling tape was constructed for use in the 
initial screening of sbbjects and for determining how the subjects identified 
the stimuli. This tap^ comprised a practice sequence consisting of four 
repetitions of the base in conjunction with each of the two endpoint 
transitions, followed by a test sequence with four sets of 27 stimuli each. 
Acroas these sets, the nine transitions occurred twelve times each in a 
randomized order. The inter-stimulus interval was 3 sec, the inter-set 
interval was 6 sec 

Our measure of discrimination performance was obtained by the method 
known as AXB. (A and*B are the two stimuli to be discriminated; X is one or 
the other. The subject's task is to decide if X is less like. A or less like 
B. ) We chose to present stimuli at three-step intervals along the continuum 
of formant transitions, because pilot work (Mann, Madden, Russell, 4 Liberman, 
1981 ) had suggested that for moat subjects a separation of that size puts 
discrimination of the chirps and the speech in a sensitive region — that is, it 
keeps discrimination from falling to the floor or rising to the ceiling. This 
step size also provided a sensitive measure of the context- induced shifts in 
phonetic category boundary that were to be the concern of our second 
experiment. 

The duplex-perception discrimination tape consisted, then, of sets of 
stimulus triadat one practice set and six test sets. Bach such set contained 
randomised sequences of the six possible three-step combinations of stimuli 
along the continuum (i.e., by stimulus nimber; 1 vs. 4, 2 vs. 5f 3 vs. 6, 4 
vs. 7, 5 vs. 8, and 6 vs. 9), occurring once each in AAB,' ABB, BAA, and BBA 
triads. Thus, over the course of the test sets, listeners responded to a 
total of 24 triads for each paii». Within triads, the inter- stimulus interval 
Has 500 msec, the inter- triad interval was 3 aec, and the inter-set interval 
naa 6 sec . 
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M additiooal AXB discriaination tape me constructed to be used in 
pretest screening of the subjects, since pilot nork (Mann et al., 1981) had 
suggested that some subjects encounter# specific difficulty in discriminating 
isolated chirps at three-^atep intervals along the continuum, and that such 
subjects also fail to diacrijiinate chirp components of the duplex percept, 
niis same tape served the further purpose of providing a basis for comparison 
vith the nonspeeoh side of the duplex percept. The stimulus arrangement was 
analogous to that of the duplex-perception discrimination tape, save that 
there was no base stimulus for presentation to the other ear, and different 
randomisations determined the order of triadip within each set. 

Procedure 

Subjects in an initial pool of 14 were pretested in groups of three or 
four while seated in a quiet room as the stimuli were played over earphones. 
For convenienpe, the third- foment transitions were always presented to the 
right ear and the base stimulus to the left. The purpose of ^he first pretest 
nee to see if the subjects could discriminate the transitions \ihen they are 
presented in isolation. Ho that end, subjects listened to the discrimination 
tape that contained the isolated transitions and were instructed to respond 
•A* or 'B' according to lAether the first or the third stimulus of each triad 
was less like the other two. Completion of the practice and test sets of item 
triads was followed by a second pretest. Ihis served two purposes. First, it 
was a screening device by lAiich we could determine vdiether mibjects were 
consistent in their labeling of the end point stimuli of the dupreat [da]-[ga] 
continuum. While the vast majority of subjects give consistent responses to 
the endpoints of our continuum when the base and third- formant stimuli are 
electronically fused, some subjects tend to give inconsistent resp6nses when 
base and transition are •dichotically presented, and we wished to exclude such 
subjects from our study. The second purpose served by the pretest was to 
provide a full identification function by which to determine, for those 
subjects in the main experiment, the extent to which discrimination on the 
speech side of the duplex percept is categorical. Both purposes of the second 
pretest were accomplished by having the subjects listen to the practice and 
test sequences of the duplex labeling . tape and respond *d* or * g* as 
appropriate. « 

♦ 

The subjects who survived the pretest participated in experiments that 
provided the results, we will present. These experiments were divided into two 
sessions, one week apart and counterbalanced in order across subjects. In the 
test sessions, as in the * prejbest , the third- formant transitions were always 
presented to the (rigHt ear and the base stimulus to the left.. In one session, 
subjects were instructed that the goal was to determine how well speech sounds 
could be discriminated in the face of some nonspeech di^tractors. They then 
listened to the practice and test sets of the duplex-perception AXB discrimi- 
nation tape, responding on the basis of the perceived similarity in the speech 
percepts of each stimulus triad . . In the other session, the subjects were 
instructed that the goal was' to determine how well nonspeech sounds could be 
discriminated in the face of speech sounds as distractors. At this time, they 
also listened to the practice and test sets of the duplex AXB discrimination 
tape , but responded on the basis of the perceived similarity among chirp 
percepts. Subjec ts listened to the same tape in the two sessions, but were 
kept in ignorance of this fact. They were instructed to listen to the target 
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speech aomdft or chlrpe, according to the session, and to ignore the 
"disttactor* on the ground that attention to it could only impair their 
perfomance on the assigned task. 

Subjects 

The subjects nere paid student volunteers recruited from an Introductory 
payclK)logy course. All ¥ere female, and none had extensive experience in 
listening to synthetic speech. Of an Initial pool of fourteen, six subjects 
vere judged on the hasis of the pretests to be Insufficiently consistent in 
their responses and wre therefore excluded from the experiment proper, tw 
for having been unable to discriminate the isolated transitions at a level 
above chance, and four for having been Inconsistent in the way^ they labeled 
the endpoints of the duplex continuum as 'd* (stimvdus one) and * g* (stimulus 
nine). Thus the final subject group included a total of eight subjects irtio 
participated In each of two sessions. 



RESULTS 



We should first report the phenomenological iresults of the experiment, 
which were clear. Given the variable third- formam transitions in one ear and 
the remaining, fixed part of the acoustic pattern Kthe base) in the other, the 
subjects did report duplex percepts: a syllable, [da] or [ga], depending-on 
the transition, and a nonspeech 'chirp.' The chirps on the nonspeech side of 
the duplexity had a time-varying quality corresponding, apparently, to the 
time-varying nature of the foment transitions. Ohis is to say, they were not 
noticeably different from what the subjects perceived when the transitions 
were presented In ii^lation. On the speech side, the syllables [daj or Lgaj 
lacked the 'chlrpiness' that characterized perception on the nonspeech side, 
and they were not different from idiat listeners perceive when transitions and 
base are mixed electronically and presented In the noraal manner. Ttie base, 
which sounded like [da], wae not perceived. That is, when the transition was 
appropriate for [ga], listeners typically perceived [ga], not [ga] and also 
(or half the time) [da]. Thus, perception was duplex not triplex; listeners 
perceived only speech (the fusion of base and transitions) and nonspeech ( the 
transitions as if in isolation). 

Beyond these observations, the data (averaged iicross the eight subjects) 
consist of discrimination functions for the speech and chirp components of the 
duplex percept (Figure 2); a labeling function iTor the speech component of the 
duplex percept (Figure 3a) ^ together with the discrimination function (Figure 
3b) that la predicted from it on the assumption of categorical perception 
(Llberman et al., 1957); and a discrimination function for chirps presented in 
isolation (Figure 4). Consider, first. Figure 2, which compares discrimina- 
-tion of the^duplex percepts under instructions to concentrate on speech (solid 
line) with* that under Instructions to concentrate on chirps (dashed line). 
Bote that, while the overall level of performance oh the two tasks is roughly 
comparable, the shapes of the two functions differ markedly. This is verified 
statistically by a significant interaction between the nature of the attended 
percept and thf stimulus pair being discriminated: F(5t35) - 13-9t Jg < -001. 

'ihe overall shape of the speech function— its marked peaks and troughs-- 
is consistent with categorical perception. Tb see how consistent, however, we 
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Figure 2. Dlscrinination of the third- fomant transitions on the speech and 
nonspeech ^des of the duplex percept. \ 
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Figure 3. A) Labeling of speech percepts as [d] or [g]. B) Discrimination 
function predicted from labeling responses^ given the assumption of 
, categorical perception. 
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mmt cQUpare the epeechF^dlacvlrainatlon function that was obtained vlth the one 
that ia predicted on the aaaunptlon of perfectly categorical perception. 
Plainly, the predicted diecrlmtnatlon function, which Is in Figure 3b, Is 
quite siailar to the one ve o)>talned. ¥e conclude, therefore, that i^en the 
thlrd-flormant transitions vera Integrated into a phonetic percept, where they 
provided crltioal support for the distinction between [da] and [ga], they were 
perceived quite categorically. ^ 

In contrast to the way the transitions were dlscrlialnated on the speech 
side of the duplex percept Is the discrimination function obtained with the 
sane transitions on the nonspeech side, where they were perceived as chlrpe. 
As shown In Figure 2, the * chirp* function has no marked peaks or troughs and 
is similar In shape to the function obtained with Isolated transitions In 
Figure 4» although the absolute level is lower, P(l,7) - 7.3, P < .05- The 
initial pair of rising chirps (Pair 1-4) is significantly mor6 discriminable 
than the final pair of falling chirps (Pair 6-9) t both for Isolated chirps, 

^(14) " 4.37f _£ < .005 and for the chirp components of the duplex percept, 

^(14) - 2.6, 2 < '02. 

As noted by Mattingly et al. (I97l)f there are at least tino strategies 
that listeners mi^ht use in discriminating the isolated transitions: they 
could, in effect. Judge their slopes or, alternatively, their most apparent 
pitches. If our subjects had opted for the first strategy, as the subjects In 
the Mattingly et al . study appear to have done, then discrimination would have 
been best for the transitions that straddle the horisontal transition (Transi- 
tion 5). But that ms not the result. Rather, discrimination became poorer 
as the transitions changed progressively from most falling to most rising. 
That result leads us to take into account an observation by Brady, House, and 
Stevens (1961 ), who noted that the most apparent pitch of frequency ramps, 
which resemble isolated transitions, is closer to the frequency of their 
offsets than' their onsets. They also observed, however, that this effect is 
stroQger for rising ramps than for falling ones. Since our transitions have 
variable onset frequencies but the same offset, we should suppose that If, as 
in the study by Brady et al., the tendency to Judge pitch by the offset 
Increased as the transitions changed from falling to rising, then we should 
have obtained the decrease in discrimination that our results do, in fact, 
show. Ve are inclined to conclude, therefore, that our subjects ware, to a 
considerable extent, discriminating the transitions on the basis of their most 
apparent pitches. ^ 

Though the overall level of discrimination for the two sides of the 
duplex percept was roughly equal, as noted earlier, discrimination of the 
transitions on the speech side was, in its most sensitive region, better than 
discrimination of the transitions on the nonspeech side. But, surely, we do 
not therefore conclude that speech discrimination exceeds the resolving power 
of the system, only that we have no Idea how the resolving power Is to be 
measured. Beyond this trulem, two observations are pertinent. One is that, 
as can be seen by comparing Figures 2 and 4t the general level of nonspeech 
discrimination obtained when the transitions were presented outside ^tbe duple3t 
context was someidiat higher than when they were perceived inside it. Perhaps 
this should be attributed to distractions provided by the circumstance that, 
in the duplex case, the two percepts, speech and nonspeech, were present at 
the same time. The other observation is that wb should not, in any case, rule 
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out tlid pssiblllty that the huDSian listener is, in fact, more sensitive to the 
foment transitions idien they support a phonetic percept than i*en they do 
not. Indeed, Bentin and Mann (Rote 1) have evidence that, in the matter of 
absolute threshold sensitivity, the speech context does provide the more 
sensitive neasiire— that is, the closer approximation to the physiological 
llBit-»*and fbr interesting reasons* 

In suomary, the difference between the tifo sides of the duplex percept is 
very great indeed • On the nonspeech side, the formant transitions evoke a 
psrcept that hae the time- varying, chirpy quality that psycho acoustic consi- 
derations should havered us to expect, and the discrimination function is 
continuous. Oa the speech side, where the same formant transitions provide 
critical support for the stops in the syllables [da] and [ga], there is ho 
apparent chirpiness in the percepts, and discrimination is nearly categorical. 

EXPEHIHEHT II 

The second experiment draws on the fact, noted in the Introduction , that 
the category boundary along a synthetic [da]-[ga] continum in irtiich the 
third- formant o^set provides the sufficient cue, can be systematically shifted 
by the presence of a preposed [al] or [ar] (Mann, 1980). For stimuli preceded 
by [al], the category boundary shifts towards a higher third-fonnant onset 
(more 'g' responses), whereas a preceding [ar] Causes a shift in the opposite 
direction. Both perceptual shifts are consistent with observations about the 
acoustic consequences of articulate ry accommodation to the new contexts: atop 
consonants that are coarticulated with a preceding liquid apparently assimi- 
late tonard the place of liquid articulation. That is, stops preceded by [alj 
tend to contain a higher third-fonnant onset frequency than those preceded by 
[ar], suggesting that they receive a more forward place of articulation. On 
that basis, Mann (1980) supposed that the perceptual context effect of the 
(preposed) liquids reflects the application to perception of some tacit 
knowledge about speech production. This in turn implies the existence of some 
specialised phonetic process. 

But, as w© pointed out in the Introduction , the possibility of auditory 
interaction exists, at least in principle. To control for such interaction, 
we will again take advantage of duplex perception. That will be done by 
putting the syllables [al] and [ar] in front of the 'base' of the dichotically 
presented (and d upl ex ly perceived) [dal-[ga] stimuli of Experiment I. We can 
find out then whether the preposed LalJ and [ar] affect perception of the 
formant transitions on both sides of the duplex percept or, as w© suspect, 
only when they are perceived as speechii 



METHOD 

Materials 

Stimulus continue . Tifo continue of disyllablea were constructed by 
putting in front of the synthetic stimuli from Ebtperimeftt I naturally produced 
syllables nhose fundamental frequency and formant structure approximated those 
of the synthetic stimuli and thus permitted the disyllable to be perceived as 
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a ooter«Qt uttaranoa pcroAnoed by one and the aaae vjocal tract. An [al-da] to 
[al-ga] continuum vaa f^ned in this nay, using the base stimulus from 
Ea^pariAant I and a token of-[al] that had been excised from an utterance of 
[al-da] produced by a male native spealcer of BagUsh. An [ar-da] to [ar-gal 
contlnuua vaa constructed by nutting In ftont of the base a token of [ar] 
excised fron an utterance of [ar*daj produced by the same speaker. In each 
case, a lOO-msec silent gap separated the offset of the natural syllable from 
the onset of the synthetic one. !Ihe continuum of foment transitions that 
cued the [d]-[g] distinction was as in Experiment I. 

Test tapes . All stimuli were dlgitieed at 10,000 Hs prior to being 
recorded onto magnetic tape for the purpose of testing. The arrangement of 
the stimuli on the magnetic tape was as in Experiment I* except, of course, 
that the 'base' was preceded by [al] or [ar]« / 

To determine hov the subjects would identify the stimuli, and thus 
provide a basis for predicting what perfectly categorical discrimination 
functions should look like, we made a dichotic 'labeling' tape, appropriate 
for duplex perception. It consisted of a practice secmence containing four 
repetitions of each endpolnt transition paired with [al] plus base, and four 
repetitions of each endpolnt transition paired with [ar] plus base, followed^ 
by a test sequence containing eight sets of 27 stimuli each. Over the te^t 
sets, each of the nine transitions occurred, in random order, a total 6f 
twelve times in conjunction with each proposed syllable. 

Tto test discrimination by the method of AXB, another dichotic tape waa 
prepared in which the stimuli were recorded in triads, exactly as in 
Experiment I, except that the base stimulus in half the triads was preceded by 
[al] and in "half by [ar]. Which syllable ([all or [ar]) preceded {he base was 
randomised from trial to trial. For both [al] and [ar]** conditions, the six 
pairs of to-bs-discriminatsd transitions were equally represented across the 
triads, as wore the various orders of transitions within each pair. As in 
Experiment I, listeners gave a total of 24 responses to each pair of 
transitions as preceded by each of the two syllables. 

Procedure 

Experiment II was run in two experimental sessions that also Included 
Qt^riment I. Thus, in one session — the session in which the instruction was 
to attend to speech percepts — the subjects first heard the labeling tape and 
then the discrimination tapes for ths two ex-periments. Order was counterbal- 
anced. In the other session, where the Instruction was to attend to chirp 
percepts, they also listened to the two discrimination tapes. Hare, too, 
order was counterbalanced. 

* 

Subjects 

Uie subjects were jthe same eighty young women iriio participated in 
Experiment I. 
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RESULTS 

The point of this experiment, it will be raoeDibered, vae to test the 
effeots of a proposed [al] or [ar] on the perception of third-fomant 
transitions when, in the one casst they are Integrated "into a speech percept 
and vhen, in the other, they are perceived as nonspeeph chirps. To display 
those effects, ve have. In Figures 3 and 6, coiablned the results of 
BcperlMsnts I •and II. DlscrliDlnatlon functions for the speech side of ttb 
duplex percepts are in Figure 3 and those for the nonspeech side in Figure 6. 
A glance at these tvo figures reveals our main finding: context had a strong. 
effect on discrimination of the transitions on the speech side of the percept 
but not on the nonspeech side. Looking more closely at the speech side In 
Figure 3f we see that the peak in the function for [da]-[ga] syllables 
preceded by [ar] (solid lines and open circles) is shifted to the right of 
that obtained in Bcperiment I, vhere there vas no preposed [ar] (solid lines, 
closed circles). Qd the Assumption that the location of the discrimination 
peak reflects the location of the phonetic boundary, an assumption ve will 
Justify lat^r, the direction of the shift in the peak is consistent with the 
earlier results of Mann (19B0). Diose same earlier results led us to expect a 
shift in the opposite direction when [al] is preposed. As can be seen in the 
function deecrlbed by the dashed lines (filled circles), the nature of the 
shift due to [al] is someidiat less clear. BDsslble reasons for this will be 
discussed later. Fbr the mcment, however, the point to be made is that the 
speech function obtained in this context is, in any case, different from both 
of the other tiio. 

In contrast to the results obtained on the speech side, the functions of 
Figure 6 indicate that proposed [al] and [ar] had no effect on discrimination 
of the transitions nhqiv^they were peirceived, on the nonspeech side, as chirps. 

To eupport the assertions of the preceding paragraphs, we offer the 
results of a three-way analysis of variance, conducted irith the factors 
attended percept (speech or chirps), context (isolated duplex stimuli, stimuli 
preceded by [al], or stimuli preceded by [arj), and etimulus pair. Alttough 
there was no significant effect of attended percept, suggesting that the 
average level of perforaance in our experiments was equivalent for speech and 
chirps, there iras an effect of context: F(2,14) " 5-38, ^< .025, and an 
effect of stimulus pair: F(5,35) ■ 5-83, ^< -001. Host important to our 
observations about the special influence of context on speech perception are 
the interactions among the three main factors. First, there was an interac- 
tion between attended percept and stimulus pair, revealing that the relative 
difficulty of discriminating individual pairs depended on whether the instruc- 
tion was to attend to speech or to the chirps: F(5,35) " 13-18, jg < .001. 
Second , there was an interaction between attended percept and contex t , 
revealing that the effect of context was greater for speech percepts than for 
the chirps: F(2,14) - 11.59, 2 ^ • finally, there was an interaction of 
context and stimulus pair: F(lO,70) - 2.46 , 2 ^ -^25, and a three-way inter- 
action: F(10,70) - 2.00, 2 ^ -OS. Separate analyses of variance for the two 
percepts reveals that, in the case of the speech percepts, the preceding 
eyllables Influenced both the level: P(2,f4) - 12.35, 2 < and also the 

pattern of speech discrimination across stimulus pairs: P(lO,70) - 3-17f 
^< .005. For the nonspeech chirps, on the other hand, an analysis of 
variance Indicates that the preposed syllables had n(jj> significant effect on 
either the level or the pattern, of performance.- 
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Figure 5. The infliience 'of preposed syllables, [al] and [ar], on discrimina- 
tion of the transitions on the speech side of the duplex percept. 
The analogous •function obtained without preposed syllables (Experi- 
ment I) is reproduced for purposes of comparison. 
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Figure 6. The influence of preposed syllables, [al] and [ar], on discrimina- 
tion of the transitions on the ndnspeech side of the duplex 
perc ept . The analagous f unc tion ob tained wi thout preposed syll- 
ables (Experiment I) is reproduced for purposes of comparison. 
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A) The influence of a preposed [ar] on labeling of speech percepts 
as [d] or [g]. B) Corresponding predicted discrimination function, 
given the assumption of categorical perception. . *^ 
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Having seen that the dlacriiainatloD functions reflect an effect of 
context on the speech aide of the duplex percept, we should now consider the 
extent to which those functions are predicted from the phonetic labeling 
results, given the assunption of categorical perception. Consider, first* the 
results obtained for stimuli preceded by [ar], as shown in FigurWTa. We see 
that the [da]-[ga] boundary occurs someidiere between Stimulus 5 and Stimulus 
8*. Comparison with the boundary obtained for the isolated [daj-[.ga] stimuli 
of Bcperlment I (Figure 3) shows that, as in the earlier experiment by Mann 
(I960)t the [ar] context moved the boundary toward the [ga] end of the 
stimulus continuum, thus increasing the nutaber of [da] responses. On the 
assumption of completely categorical perception (Liberman et al., 1957)i we 
ahould have expected to obtain the discrimination function shown in Figure 7b. 
In fac t, the discrimination function we did obtain ( solid lines and open 
circles of Figure 5) is quite similar to the expected One. Certainly, the 
peak is in the right place and only slightly higher (as it so often is in such 
situations) than it should have been. Ihus, the obtained discrimination 
function does reflect, the phonetic boundary; moreover, it can be seen, by 
comparison with the result fQr the isolated syllables, to reflect the context- 
conditioned shift in that boundary caused by the preposed [ar]. 

As for the labeling function obtained with the preposed [al], seen in 
Figure 8a, we note, first, a large inversion in the responses to Stimulus 1. 
Putting that aside ^or the 'moment, we see that, by' comparison with the 
labeling data for the isolated syllables (Figure 5)» the [da]-[ga] boundary 
with preposed [al] is shifted strongly toward [da], producing, thus, an 
increase in' the number of [ga] responses. This, too, is consistent with iihe 
earlier finding by Mann. However, the most extreme falling transition of her 
earlier study did^ not evoke the large nunber of [ga] responses that its 
counterpart (Stimulus 1) did in the present one. Of course, the conditions of 
the two experiments were not identical. In the present experiment, but not in 
the earlier one, the judgiients were made on the speech side of a duplex 
percept. Another difference between the experiments, .^nd a second likely 
cause of the difference in result, is that the stimuli were not exactly the 
same. I^rhape, then, the most extreme falling transition of thia experiment 
went'beyond the limit fpr [da]. At all events, we should note that in the 
other two labeling functions obtained in this experiment ([da]-[ga] in 
isolation, as in Figure 3, and [da]-[ga] with [ar] preposed, as in Figure 7) 
there is also a tendency for the responses to the extreme falling transition 
of Stimulus 1 to show some inversion toward [ga]. Perhaps the inversion in 
the [al] context is simply an exaggeration of that tendency, and, as such, a 
further reflection of the. strong bias toward [ga] produced by the preposed 
[al]. 

In* any case, the labeling results for the [al] context yield the 
predicted discrimination function seen in Figure 8b. There is only a low 
peak, but its position reflects a shift in the phonetic boundary opposite to 
that which was produced by the preposed [ar]. looking now at the obtained 
disc rim inatiojn function in Figure 5, we see a moderately good fit to the one 
that was predicted. We conclude, then, that in the [al] context,, as in the 
[ar] contextr the discrimination function reasonably reflects the phonetic 
boundary and the effect that context has on it. 
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Figure 8. A) The influence of a pre posed [al] on labeling of speech percepts 
as [d] or [g]. B) Corresponding predicted- discrimination function, 
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In striking contrast to the effects of phonetic context on the speech 
side of the duplex percept is the absence of such effects oh the nonspeech 
side. As shoim in Figure 6, and as previously noted, the d iscriminatiorr 
funotiona for the transitions perceived as chirps are much the seime when [ar] 
6r [al] is preposed as idien, in Experiment I, they were not. Moreover, the 
shape of the functione reflects perception that is more nearly continuous than 
categx)ricai * The slopes indicate that, as in the case of the isolated 
patterns of Experiment I, discrimination of falling transitions vs. less 
falling ones Was, other things equal, better than rising vs. less rising; 
t(l4>2.75, P<.02 for stimuli preceded by [al], and t(l4)-2.7, p<.02 for those 
preceded by [ar]. 

\ DISCUSSION 

Our concern has been to account for two effects previously observed in 
the perception of foraant transitions as cues for stop consonants: tendencies 
toward categorical perception and shifts in the positions of category boundar- 
ies with phonetic context. Categorical perception, which we will consider 
first, has two manifestations, at least in the case of speech perception. Ihe 
one, and the one to which attention has hitherto been directed almost 
exclusively, is the discontinuity in perception that defines a boundary on 
some physical continuum* Bie other is in the phenomenal natxire of the 
perceived category, which is more appropriate to a linguistic object than to 
an auditory one (Liberman, 1982). In speech perception, these two manifesta- 
tions presumably reflect the same underlying process, but they are separable, 
at least in principle, and we should take a moment to say how. 

Given that the fomaiit transitions are modulations in frequency, they 
might be perceived, correspondingly, as modulations in pitch. If so, percep- 
tion could be nonetheless categorical. Thus, given a continuum of transi- 
tions, the listener might perceive thejn discontinuously— for example, as 
rising or falling pitches. Such automatic sorting of auditory percepts would, 
of course, be of use to listeners since it would relieve them of having 
deliberately to make the categorical assignments that the phonetic and 
phonological structure of the language require. Hit if, as in this example, 
perception of the transition cues, and all the other cues for the same ph4)ne, 
retained their auditory character, then perception of speech would be like 
perception of Ftorse code or some other arbitrary acoustic cipher. *In that 
case, a listener would perceive rising or falling pitches, together with the 
auditory correlates of the many other acoustic cues, and have then to 
• interpret* the resulting melange as a unitary phone. Presumably, the process 
of interpretation would, in time, become automatic, as, indeed, it does with 
people skilled at Morse, but the purely auditory character of the, percept 
would continue to intrude. This would be the more distressing because the 
auditory percept has little or nothing to do with the linguistic function of 
the phonetic unit it conveys. 

Ito draw an analogy from visual perception of depth, consider how 
confusing it would be if, in the use of the retinal disparity cue, we were 
a*®re, not just of the distal depth, but also of the proximal disparity 
(doubling of images) that provided the relevant inforaation. Fortunately, 
processing is accomplished in this case by a specialized module that uses the 
proximal disparity to yield in consciousness only perception of the distal 
depth relationships among visual surfaces. 



65 7 



Somm DLff«r«aoM B»tmen Ri6n»tic axul Aadltory Kodoa of Perception 



¥t iiould ar^ue, than, that a aimilar module operates in speech perception 
to yield in conaciouaneaa only the distal phonetic object, free of the chirps 
or glisaandos we tiould otherviae hear* &is muld, as ve have indicated, be 
eapecially appropriate for the purposes of language, given that evBrything 
irhat no need to know about a stop consonant, for example, has been provided 
uhan any particular token has been^ identified as this stop consonant and not 
that one. In that dense, a stop consonant represents nothing but the 
oategorical and abstract segment the speaker intended. Hence, awareness of 
the auditory attributes of its various acoustic cues would, like awareness of 
proximal retinal disparity, be irrelevant, at best, and, at worst, ereriously 
distracting. 

As pointed out in the Introduction , listeners are, indeed, quite aware of 
the auditory attributes of the transitions when they are presented in 
isolation, in which case they sound like chirps, but not when, as part of a 
larger acoustic pattern, they support perception of stop consonants. This 
difference, as was also pointed out,' occurs in conjunction with a difference 
in categorical perception in the more ' usual sense: disrimination of the 
transitions is continuous or categorical, depending on whether they are 
perceived in isolation, as chirps, or, together with the rest of the acoustic 
pattern, as stop-vowel syllables. As we have indicated, we find it plausible 
to suppose that incorporation of the transitions into stop percepts, and, in 
particular, the contrast this presents to their perception as chirps ,v reflects 
a specialised phonetic process, well-adapted to providing just the abstract 
categories the larger language system uses. But it is at least conceivable, 
if implausible, that ordinary auditory perception is at work — that in this, 
and in all the many similar cases where there exist pai^allels between speech 
perception and speech production, t^e articulators are so controlled as to 
produce exactly those combinations of cues that fit into independently 
existing interactions of an auditory sort. 

•• 

' The second effect that concerns us, namely, that the positions of the 
category boundaries shift with phonetic context, has b^en taken as a 
reflection of the context-conditioned variation in the acoustic signal that 
results from the way it is produced. Specifically, the variations in the 
signal are the consequence of the coarticulatory arrangements that make it 
possible for speakers to fold phonetic segments into larger units^-syllables, 
for example — and thus produce the segments much faster than they otherwise 
could. (To do otherwise, in this case, would entail making each segment a 
8yllable--that is, to spell.) But listening to speech would be awkward if all 
the auditory consequences of these context-conditioned variations were 
prominent in consciousness. Given, in the cases we are Concerned with, that 
the perceptual cotoben^tion is made automatically— that is, that the category 
boundaries shift appropriately — we assume that in thi§ instance, too, we are 
seeing the effect of a highly adaptive and distinctively phonetic process. 
But, again, one might suppose, however implausibly, that the effect is simply 
auditory— that ih this, and in every other such case, coarticulation occurs, 
not to make it easier to apeak, but only to accommodate the sounds of speech 
to the characteristics of the auditory system, and especially to auditory 
interactions. 
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The puxrpose of the experiments reported here was to exploit the phenome- 
non of duplex perception to provide data relevant to deciding between these 
phonetic and auditory interpretations of stop consonant categories and their 
Aovenent uith context. The results were quite clear. Given an isolated 
third-fomant transition appropriate for the stop in [da] or [ga] to one ear, 
and the remainder of the acoustic syllable to the other, listeners perceived 
the transitions in tw phenomenally different ways: as nonspeech chirps, just 
like tlK>se they perceived lAen the transitions were presented in isolation, 
and as critical support for the stops in syllables [da] and [ga], in which 
case the percept ms Just like the one thkt was evoked when the transitions 
were electronically mixed With the rest of the acoustic pattern and presented 
in the no rmal manner . The remaind er of the ac oustic syllabi e , which in 
isolation sounds like speech, was not also perceived, which is to say that the 
percept was duplex, not triplex. On the nonspeech side of the duplexity, the 
chirp percept conformed reasonably to iriiat psychoacoustic considerations might 
have led us to expe^ct. Moreover, perception of these chirps was contihuous, 
and there w&s no measurable effect of phonetic context. On the speech side, 
there was a phonetic percept--a stop consonant — not readily describable in 
auditory terms. In addition, perception was strongly categorical and the 
category boundary moved in expected ways as a function of phonetia context. 

We should emphasiee that the two classes If percept were evoked by 
transitions that were always paired, albeit in the other ear, with the 
remainder of the acoustic syllable. Thus, the two constituents of the 
dichotically presented pair, having been mixed in the nervous system, were 
free to interact or not. If, in that circumstance, we were to attribute the 
results on the speech side of the percept to interactions of an auditory kind, 
what would we say then abolut the results .on the other side? Hbw would we,, on 
such an auditory accounU explain why the dichotic constituents interact to 
produce a nonnal [dapor [ga], but also fail to interact, not for both 
constituents, but only for one--the isolated transitions? Why, that is, was 
there perception of the isolated transition as such, but no comparable 
'isolated' perception of the stimulus to the other ear, the 'base' that^ by 
itself, sounds like speech? To account for the fact that the percept was, in 
this way, only duplex, jre should suppose that there are two modes of 
processing at work in the perception of the transitions, and that, happily 
from our point of view, the peculiar conditions of the dichotic presentation 
make the results of both modes available to consciousness. In the one mode, 
which is auditory, are the processes that underlie perception of the transi- 
tions as nonspeech chirps. In the other, which is phonetic, the transitions 
are incorporated into the speechlike pattern that was presented to the other 
ear, where they serve the singularly linguistic purpose of distinguishing the 
abstract categories [da] and [ga]. 
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DUPLEX FBBCEFTIOS: COHFIRNATION OP FU3I0H 

Brmo R« BopPf Cbristina Milburti^t and John Ashkenas-t- 



Abstract. Duplex perception — the siinidtaneous perception of a 
speech syllable and of a nonspeech "chirp" — occurs irtien a single 
fonaant transition and the remainder (the "base") of a synthetic 
syllable are presented to different ears. IVo experiments were 
conducted to test whether the speech percept derives from the 
dichotic fusion of the transition with the base or from phonetic 
information extracted directly from the Isolated transition. 
Experiment 1 showed that subjects were unable to assign speech 
labels to isolated transitions in a consistent manner, although 
the same transitions led to accurate identification when paired 
with the constant base* in the other ear. Experiment 2 used an AXB 
paradigm to show that selective attention to the ear receiving the 
base does not prevent the contribution of the contralateral 
transition to the speech percept.^ Both experiments support the 
hypothesis that the speech percept in the duplex situation reeults 
from dichotic fusion at a fairly early stage in processing. 



INTRODUCTION 

The phenomenon of duplex perception has been taken to support the 
existence of a specialissed phonetic mode for perceiving speech (Liberman, 
1979; Liberman, Isenberg,.a Rakerd, 1981; Mann A Liberman, in press). Riplex 
perception occurs when a synthetic consonant- vowel syllable is split in a 
certain way and presented di^hotically (Rand, 1974)* If the initial foraant 
transition that identifies tfte "Wnsonant is removed from the acoustic context 
of the rest of the syllable and played in isolation, listeners report hearing 
a nonspeech "chirp." When the rest oT the syllable without the transition, the 
"base," is played in isolation, listeners report hearing a syllabje, sometimes 
beginning with the same consonant as the whole syllable and sometimes not. If 
the chirp is now presented to one ear and the base to the other ear, with the 
two stimuli timed to coincide as they would in the whole syllable, listeners 
report a duplex percept. In the ear to which the chirp was presented, they 
hear a nonspeech sound--the chirp as it sounds when played in isolation. In 
the other ear they hear .speech that they correctly identify as the original 
syllable from which the two stimuli were derived. 
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Tho standArd «planation given for this phenaaenon is that the base and 
the cMrp are fused to form the vhole syllable that is heard in one ear, while 
the chirp alone is also heard separately in the other ear (Cutting, 1976; 
Libenan et al., 1981). According to this account, the chirp is heard 
alntiltaneoualy as part of the fused >l5peech syllable and as nonspeech (as it 
sounds in isolation) . Ihe duplex phenomenon therefore supports the existence 
of tMd distinct modes for perceiving sound: one auditory, for nonspeech 
sounds, and the other phonetic, a mode of perception specialised for process* 
ing speech (Libeman et al • , 1981; Mann A Librerman, in press). Both modes 
seem to be engaged simultaneously in the duplex situation. 

Ihe above account is based pn listeners' introspections and has never' 
been tested directly. There are alternative theoretical possibilities, 
however, that make such a test desirable. It has been suggested (Nusbaum, 
Schvab, A Savusch, Hote l) that although the foraant transition in Isolation 
sounds like a nonspeech chirp, it. may contain enough phonetic information for 
listeners to identify the consonant that it cues. In the duplex situation, 
listeners may then identify the syllable correctly on the basis of the chirp 
alone, and since the b.ase in ^the other eax^ sounds like (perhaps ambiguous) 
speech, listeners merely attribute the speech percept to that ear. According 
to this hypothesis, no fusion of the chirp and base occurs, and the formant 
transition is perceived in exactly the same (simplex) way when It is presented 
with the base as when it is not. 

Two easily testable predictions folloiStfom this nonfusion hypothesis: 
(l) Isolated foraant transitions should be identif iabl% as the consonants they 
are intendM to cue, and (2) listeners In. the duplex situation should report 
hearing the base when they focus their attention on the ear in which It 
occurs. Ve conducted two experiments to 'exemine these issues. 

EXPERIHEHT 1 ' 

The hypothesis that subjects might be able to assign phonetic labels to 
isolated foraant transitions is in apparent contradiction to claims in the 
literature that these stimuli are pure nonspeech sounds (e.g., Kattingly, 
Liberaan, ^rdal, 4 Kalwes, 1971). However, these claims may have been 
exaggerated. Investigators familiar with stimuli of this kind will have noted 
that, for example, isolated second- foraant transitions derived from /ba/ and 
/ga/ sound vaguely like /wa/ and /yo/, respectively. Since these glides share 
place of articulation with the relevant stop categories, subjects may be able 
to assbciatj) the two manner classes and thereby arrive at consistent labeling 
responses. lb make such an association is different from actually hearing 
/ba/ and /ga/ (which is what subjects experience in the duplex condition). 
Nevertheless, a recent demonstration that subjects indeed can label isolated 
second-foraant transitions in a consistent manner (Nusbaun et al . , Note l) 
raises the question whether the speech percept in the duplex situation is 
similarly derived from the chiij^ps alone. " / 

iStperiment 1 used synthetic stimuli that foraed a ikBj'-lge.l continuum and 
were distinguished only by the transition of the third foraant. These 
transitions are in a much higher frequency range than the second- formant 
transitions waployed by BusbaiiD et al. (Note l) and sound considerably less 
speechlike. Duplex perception has been obtained with similar stimuli by Mann 
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axii Llbetman (in press). The present study attempted to replicate this 
finding and tested, in addition, lAiether subjects can label third- foment 
chirps consistently as /da/ .or /*ga/ . ^ The goal of the ^pe^rlment tos to 
demonstrate that duplex perception can be obtained with chirps that, by 
thefloTselvea, are not readily associate' with phonetic categories. 

Method 

Subjects . A total of twelve subjects participated. EUght of them were 
student vak^teevs with little or no previous experience in speech perception 
experiments. The other four were familiar with the purpose of the experiment 
and included two relatively experffenced ( BHR and a fellow investigator) and 
two ' relatively inexperienced listeners ( CM and JA). 

Stimuli . The stimuli were six three- fomant synthetic syllables created 
on the Haskins laboratories parallel resonance synthesizer and forming a /da/- 
/ga/ continuum. All syllables wejre 250 msec long and had linear 50-msec 
initial transitions in all three fonnants, followed by a 200-msec steady 
state. The first .f^imjint rose from 285 to 771 Hz, the second foimant fell 
from 1770 to 1233 Hz, and the third foment, which alone distinguished the ''six 
syllables, started, at a variable frequency and went to 2525 Hz. The onset 
frequencies of the third fotmant in the six stimuli were 2862, 2694> 2525> 
2348j 2180, and 2018 Hz. Bie "chirps" consisted of the 50-msec transition of 
the th,ird fomant in isolation; the "base" consisted of a syllable without 
that dij^stinctive Itransition, i.e., with no energy in the third-fomant region 
during the first 50 msec. Consequently, there wer^ six different chirps but 

only one base . ) 

/ 

Three tajpes were recofnied. Qa the first, the 3ix chirps occurred in 
isolation. On the second, the six full syllables were recorded, wi*th the base 
thrown in as a seventh stimulus. The third tape contained the six duplex 
syllables^ with the chirp on one channel and the base on the other. On each 
tape, the stimuli we re« repeated 20 times in random sequence, with interstim- 
ulus intervals of 3 sec. . 

Procedure . The subjects listened in groi^ps over TDH-39 earphones^ in a 
quiet room. The isolated chirps were presented first, to avoid any effects of 
^yperi^nce. The subjects were told that Jhey would hear chirpClike soxinds btlt 
sTiould do^ their best .label these sounds as "d" or "g," guessing if 
necessary. The chirps yere presented monaurally to the right ear. Next, the 
full syllables arid the base were presented monaurally to the' left ear. The 
subjects were instructed to identify the consonant in these syllables as "d" 
or "g." This was I'ollowed by the duplex tape, with thie base always in the left 
ear and the chirps in the^right e^r. The subjects were told to ignore the 
chirps and identify the syllables in their left ear. Finally, the eight 
inexperienced subjects listened to the isolated chirps for a second tme, to 
detemine idietjtier exposure to the duplex, condition had any beneficial effect 
on chirp identification. ^ 

Results and Discussion 

A first inspection of the data revesLLed no difference between the results 
of the first and second cljirp identification tests for the naive subjects, so 
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both vere caoibined* Furthemore, there mre no systematic differences between 
the results for naive and experienced or infomed listeners, so their results 
were pooled, too. The average results of all twelve subjects are displayed in 
Figure 1 . 

The results are very clear. First, in both the full-syllable and duplex 
conditions^ the stimuli were labeled quite consistently, whereas labeling of 
isolated chirps was totally random for the subject group as a whole. ^ Second, 
there was a siseable difference between the fiill- syllable and duplex labeling 
functions; there were generally more "d" responses in the duplex condition, 
F(1 J1) - 11.7, 2 ^ 'O^* ^ 

The poor labeling performance for isolated, chirps was expected. These 
stimuli bore no. resemblance to speech. While some of them sounded discrimin- 
ably different, at least to some listeners, they could not be consistently 
associated with the two phonetic categories, "d" and "g." Inspection of 
individual data revealed only two, listeners (both from the inexperienced 
group) who did label the stimuli in a cotisistelit way: One labeled stimuli 1 
and 2! ''d" and stimuli 3-6^"g" most of the time, ^ile the other labeled 
stimuli 1-3 "g" and stimuli 4-6 "d" throughout. These subjects, at least, 
could discriminate quite accurately between different chirps, but the opposite 
directions of their category assigzinents suggests that the phonetic labels 
ifere used arbitrarily* to designate the psycho acoustic categories of rising and 
falling pitch. (Stimulus 3 hael a level pitch.) The experienced listeners 
probably could have made use of these categories also, but did not because 
they tried hard to follow the instructions io hear the stimuli as "d" or "g", 
which led to randoioi performance • % 

Since all subjects gave orderl^ labeling responses in the duplex condi- 
tion, these data strongly suggest that the speech percepts in the duplex 
situation were due* to dichotic fusion and not to phonetic labeling of the 
chirps. 'BSy implication, dichotic fusion may be assimed to occur also in 
duplex situations involving someiAiat more speechlike (viz., second- form ant) 
chirps. • 

The finding of a difference in labeling functions between the full- 
syllable and ^Kiuplex conditions is in need of explanation. One possibility is 
that, in the duplex condition, fusion was not complete, so that the phonetic 
category associated with the base exerted a bias on identification. The base 
on the full-syllable tape was identified as "d" on 87- 1 percent of the trials; 
that is, it sounded essentially like /da/. The shift of the duplex labeling 
function in favor of "d" responses is consistent with the hypothesis just 
proposed. However, other data (Nusbaun et al.. Note 1; Mann, Note 2) do not 
seem to follow this pattern. An alternative possibility is that the duplex 
condition favored the category assoq^iated with a falling critical formant 
transition over the category as'sociated with*a rising transition. It has long 
been known that- the first fomant exerts an "upward spread of masking" effect 
on the perception of the higher formants; indeed, this ef fect^motivated the 
original research using /duplex an4 split- formant stimuli (Rand, 1974; Nye, 
Nearey, S Rand, 1974)» This "masking" may be partially due to an incompati- 
bility in the direction of fomant transitions (cf. Schwab, 1981): Since the 
first fomant in initial stop consonants is always rising in frequency, the 
perception of simultaneous falling^ transitions in the higher fomants may be 
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selectively lapelred. Di'chotic presentation may reduce this incompatibility 

effect* and this may explain the increase in responses corresponding to the 

^ cat^ory cued by falling foment transitions. Ohis explanation seems in 

agreement with data reported by Nusbaum et al. (Note l) but may not be 

universally valid (Mann, Hbte 2).. - 
♦ 

EXPERIMENT 2 * 

Experiment 2 examined the hypothesis that subjects, . when selectively 
attending to the ear containing the base, might actually perceive the-syliable 
represented by the base and not the one thought to result from the fusion of 
the base itLth a contralateral ^chirp. Despite instructions to ignore 'the 
' chirp, the labeling task of Erperiment 1 may not have provided a sufficient 
incentive for directing full attention to the ear containing the base. In the 
present study, an AXB forced-choice paradigm was used instead, which required 
subjects to majke judgments about stimuli in one ear only. Subjects' inability^ 
to recover the base under these conditions would provide further-jsupport for 
early die ho tic fusion as the cause of the reported speefch percept. 

Method ' , * 

Subjects. Ohe same subjects as in Eicperiment 1 participaj;ed in this 
test^^which was administered . at the end of the same single session. 

Stimuli . The stimuA were the two endpoints of the /da/-/ga/ continuum, 
their duplex versions,, and the isolated base. Ohes^ five stimuli were 
arranged into AIB triads in the following way: The A and B stimuli, ^rtiich 
were always different from each other, were either the two full syllables or 
"one of them and the base, in either order. The' X stimuli inserted into these 
six possible frames were the two duplex syllables and the base. This resulted 
in 18 different triads that were recorded five times in random order, with 
interstimulus intervals of 1 sec within triads and of 4 sec between triads. 
All stimiili were recorded on the left channel except for the chirps of the 
duplex syllables, Mhich occurred on the right channel. 

Procedure. Ohe subjects were instructed to pay attention only to their 
left ear and ^to judge in each triads whether the middle stimulus sounded more 
similar to the first (response "l") or to the third stimulus (response "3"), 
guessing if necessary. Note that the A and B stimuli were always monaural, 
which forced attention to the ear receiving the base of the duplex X stimuli. 

Results and Discussion 

The majority of thf stimulus triads were unlnfomative and merely 
provided the background for the critical triads. Since it was known from 
y Experiment 1 that the base by Itself sounded like /da/, it was to be expected 
that for a triad such as "full /da/, duplex /da/, base" subjects* judgments 
would be fairly random, for they would hear "/da/, /da/, /da/." OSie critical 
triads were those in which duplex^/ga/ occurred between full /da/ and full 
/ga/, or between the base and full /ga/. Because the base of duplex /ga/ 
sounds like Zda/ , duplex /ga/ should be judged to be moTre similar to ^either 

« • * 76 



B 



Base 



Base H 



Base 



Duptex Duplex 
/ga/ /da/ 



Base 



Duplex 
/da/ 

Base 



/da/ ^■ 



Duplex 
/da/ 



100 

•■•I — 



80 



20 



60 

I 



40 



40 



60 



H /da/ 



— ar 
Duplex 
/ga/ 



< /ga/ 



X— ' Vga/ 



20 
— 



Duplex 
/ga/ 

0 

— — « 



80 



100 



PERCENT AXB PREFERENCE 



Figure 2: AXB similarity judgments (Experiment 2). 



Supliac P»ro#|>tloiis " Conflmatlon of FUslon 



full /dm/ or to the base than to full /ga/ if fusion can be avoided. Hhe 
fusion hypothesis, of cotiree, predicts exactly the opposite. 

Ihere were no systematic differences betneen experienced and inexperi* 
enced subjects 9 although the foraer provided someiAiat more consistent results. 
Ihe results for all IZ subjects combined are displayed in Figure 2. The 
fig.ure 'shovs the percentages of trials on irtiich each of the three X stijnuli 
«s J\iiged to be More similar to either A or B. Each line shows one of the 
three A-B frames, combining the two possible orders. The results are 
unambiguous. Vhen both A and B sounded like /da/ (line 1), subjects responded 
randomly, although " the duplex /ga/ was judged to be somewhat more similar to 
the base than to the full /da/. Vlhen one frame stimulus sounded like /daA and 
the other like /ga/ (lines 2 and 3), the base and the duplex /da/ were judged 
to be more similar to /da/, whereas the critical duplex /ga/ was judged to be 
more similar to /ga/. Note in particular that, in the sequence "base, duplex 
Jga/f full /ga/," the attended ear received two identical stimuli (the base) 
followed by a different onej nevertheless , subj ects c|ios0 the third stimulus 
as being significantly more similar to the second than to the first, 
indicating that the perception of the second stimulus was significantly 
altered Jthrough fusion with the contralatei'al chirp. 

CONCBUSION 

The present results strongly support the hypothesis that chirp and base 
fuse at a fairly early stage in processing (see Cutting , 1976) . This fusion 
seems to be obligatory and , unlike some higher- level dichotic fusions (Sexton 
A Geffen, 1981), to be unaffected by . selective-attention strategies. The 
present findings definitely refute the hjf^thesis that the phonetic percept in 
the duplex paradigm derives from the assignment of speech labels to the 
unfused chirp. The interpretation Of duplex perception provided most recently 
^ by Uberman et al . (1981 ) and by Mann and Ubennan (in press) therefore 
appears valid and provides a sound basis for further demonstrations of a 
dissociation between phonetic and auditory modes of perception (Mann, 'Note 2). 
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OH THE KIHEIIATICS Of ARTICUUTOHY COHTROL AS A FUNCTION OP STRESS AND RATE* 
Betty Taller,* J* A. Scott Kelso, and Katherine S. Harria++-^ 



Abstract * In thiB article we examine the effects of changing 
spealdng rate and syllable stress on the space- time structure of 
articulatory gestures. Lip and jaw movements of three subjects were 
monitored during production" of selected bisyllabio utterances in 
which stress and rate were » orthogonally varied . Analysis of the 
relative timing of articulator movements revealed that the time of 
onset of gestures specifio to consonant articulation was tightly 
linked to the timing of gestures specific to the flanking vowels. 
The observed temporal stability was independent of lai^e variations 
In displacement-, duration, and velocity of individual gestures. The 
.kinematic results are In close agreement with OAir previously report- 
ed findings (Tuller, KeAso , A Harris, 1 98;? V and together provide 
evidence for rela^ionnl invariance in articulfition. 

Many studies of speech motor control have examined the •^ffects that 
linguistic constraints, such as phonetic context, level of stress, and 
speaking rate, may have on movements of the articulators and their underlyio^ 
muscle activity. An alternative approach that we adopt iie/H, to ask what 
aspects of articulation might be preserve! across these linguistic variations. 
In a prtviaus paper (Taller, Kelso, 4 Harris, 1982) we suggested- that it is 
the internal timin^^ relations of an utterance that remain stable across 
Qriationa in speaking rate and syllable stress*. In -that study we analyzed 
the phase relations among various artieulatory muscles and found, that the time 
of onset of activity for consdnant produc tion vas relatively fixed in relation 
to the time of onset of activity for the flanking vowels. This temporal 
stability held across substantial changes in the peak ampritude and duration 
of EMG activity in the individual muscles (Tuller, Harris, a Kelso, 1982). It 
is not known, liowever, whether the kinetnatic structure of the articUlatory 
movement trajectories exhibits an analogous pattern. 

lb this'end, we had one male and two female subjects produce utterances 
of the £om V vowel- consonant- vowel-b with the medial eo/isonant presented and 
spo'ken as the first element of the aecor^l syllable. The first vowel (V1) was 
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4 Figure 1* Timing of low^r lip raising for medial consonant articulation as a function of the 
/aw*i}l-. to- vowel period for one subject's productions of baCab utterances. 
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either /W or US , the second vowel (V2) was always /tV • and the medial 
consonant (C) was either /b/, ^ /w/, or hi . In the rest of this paper, 
/O/ will be symbolized as /a/ and Ifj^ as /ae/. Bach utterance was spoken with 
two stress patterns, with primary stress placed on either the first or second 
syllable. The subjects read quasi-ranjiom lists of these utterances at two 
self-selected speaicing rates — one conversational (termed "slow" in the fig- 
ures) and the other somewhat faster, f Bach utterance * was embedded in the 
carrier phrase "It's a again" to reduce the effects of initial and final 

lengthening and prosodifi variations. Twelve repetitions were produced of each 
utterance. / 

Articulatory movement in the up-down direction was monitored U3:^g an 
optical tracking system that followed the movement of lightweight inTrared 
light-anitting diodes attached to the subject's lips, jaw, and nose. In order 
to minimize head movements during the experiment, output of the LED on the 
nose was displayed on an oscilloscope placei directly in front of the subject, 

who was told to keep the display on the zero line. 

i .. 

Acoustic recordings weramade simultaneously with the movement tracks and 
both were computer-analyzed on subsequent playback from FM tape. Acoustic 
tokens were first excised from the carrier phrase using the PCM system at 
Haskins laboratoriea^^^then played in random order to four listeners who judged 
each token's phonetic make-up and stress pattern. Tokens were omitted^ from 
further analysis if more thaii one listener judged the token as having a 
different stress pattern from the appropriate one or if any phonetic errors 
were noted. .After this procedure, at least nine tokens generally remained for 
each utterance type. 

The movement records were input into a PDF 11/45 computer, using a 
sampling rate of 200 Hz. To correct for up- down head movanents, output of the 
nose LED was subtracted (by a computer program) from output of the LEDs 
attached to the lips and jaw. Similarly movements of the lower lip were 
corrected by subtraction for movements of the jaw. For each token, displace- 
ment maxima and minima, and the times at which -they occur, were obtained 
individually for the jaw, the upper lip, and %\\^ lower lip corrected for jaw 
movement. 



I! 



Recall that the main thrust of this study is to examine the relative 
timing of articulatory movements. In keeping with various studies of non- 
speech motor skills, we chose to define articulatory timing in terms of the 
phase relations among events in the movement trajectories. This requires 
delimiting some psfriol of articulatory activity and the latency of occurrence 
of an articulatory event within the defined period. Over linguistic varia- 
tions, in this case stress and rate, these intervals will change in their 
absolute durations. The question is" whether they change in a systematically 
related manner. 

Cur earlier electromyographic study (Tuller, Kelso, k "Harris, 19B2j 
showe.d this temp9ral systematicity only when the lufcency of consonant ons^ 
was considered relative to the period between vowel onsets. We used this 
result to guide our investigation of articulatory kiii^matlcs, although the 
p^iase relations of other events were also examined* 
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Flijure 2. Timing of lower lip raising for medial consonant articulation as a function of the 
vowbI- to- vowel pjriai for one subject's productions of baeCab utterances. 
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ligure 1 show ono Idnematic measure that is intuitively commensurate 
with the tattporally stable EMG measure for one subjecfs productions of the 
utterances /babab/, /baiab/, /bavab/, and /banab/. Hhe i-axis represents the 
intervid (in msec) from the onset of jaw lowering for the first vowel to the 
onset of jaw lowering for the second vowel- The y-axis is the interval from 
the onset of jaw lowering for the first vowel to the onset of lower lip 
raising for the medial labial consonant. In this figure and those following, 
the jaw component has been subtracted from the lower lip mov«nent. The 
measurements for the axes are iirficated schematically in the upper right-hand 
corner. Bach point on a graph is one token of an utterance type. Pilled 
circles are firom tokens spoken slowly (that is, at a conversational rate) with 
primary stress on the first syllable; open circles are tokens apoken slowly 
with stress on the second syllable; filled triangles are apoken faster id th 
primary stress on the first syllable; open triangles are fast, stress on the 
second syllable. 

A T^arson's produc t- moment correlation and a linear regression were 
calculated for eadh distribution. High correlations would signify that the 
relative timing of these articulatory events was maintained over variations in 
syllable stress arid speaking r^te. Obviously the calculated linear correla- 
tions are very high: .95, .96, ".9U and ^94. ae slope of each function (m) 
is also indicated. Hotlce that the slopes for / p/ and /b/ are steeper than 
for /v/ and /w/. Ohis means that as the v duel- to- vowel interval increaees. 
the latency of loner lip movement increaaea proportionately more for produc- 
tion of the stops than for production of / v/ and /w/. 

Figure 2 ahons the same meaaures for utterances whose first vowel was 
/ae/. produced by the same subject. The interval from jaw lowering for the 
first vowel to jaw loiie ring for the aecond vowel is on the x-axis; the timing 
af lower lip raising for the medial consonant relative to jaw lowering for the 
flrat vowel is on the y-axis. In these aeCa utteraneea. we find essentially 
identical results as for the aCa utterances. The temporal changes are highly 
correlated ( .9U -87, .95, and .93). with the sloi^^ of the functions for /p/ 
and /b/ steeper than for /v/ and /w/ . / 

*lgure 3 again shows the timing of medial consonant articulation relative 
to the timing of the flanking vowels. In this eaae, however, we have defined 
the onset of consonant articulation as the onset of the lowering gesture in 
the upper lip. Utterances ^ with medial /v/ are not included because no 
ayatematlc upper lip movement was noted. Again, sjhe changes in duration of 
the two measured inter/ala are highly correlated, ranging from *gO for 
/baewab/ to .98 for /babab/ . 

though Figures 1 through 3 illustrate the data from only a single 
sjubject (GH), the two other aubjecis shoied essentially the e^gme pa ttern. The 
left half of Table 1 shows the values for all three subjects obtained by 
correlating the period between the onsets of successive vowel articulations 
lath the latency of onset of consonant articulation. Correlations obtained 
when consonant articulation is defined by the raising gesture of the lower lip 
are shown separately from correlations in which consonant articulation io, 
defined by the lowering gesture of the upper lip. The lowest correlation 
obtained for any utterance was .84- I^t us underscore that ^these high 
correlations occur even though other aspects of the movements , ^uch as their 
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Timing of upper lip lowering for medial consonant articulation,^ as a function of the 
vowel- to- vowel period for one subject's productions of baCab andli BaeCab utterances. 
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Table 1 ' 

Pearson's Product-Moment Correlations for All Three Subjects Dascriting Eela- 
tionshlps'* Between Various Periods and latencies, as Indicated * 
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1 Latency of (Jaw) ,to medial C (lower lip) relative to Vi to V2 (jaw) 
period. 

* 2Latency of (jaV) to medial C (upper lip) relative to Vi to V2 (jaw) 
period. ^ ^ 

5Latency of C2 (lower lip) to V2 (jaw) relative to C2 to C3 (lower lip) 
period. ' ^ 

4l,flj»ncy of C2 (upper lip) to V2 (jaw) ^relative to C2 to C3 (upper lip) 
period. 
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I 

» 

displacement, yeliocity, and duration, change substantially. We also examined 
the correlation betveen the duration of consonant- to-consonant periods and the 
latency of production of the intervening vowel. The calculated correlations, 
show in the right half of Table 1, spanned a wide range of values (-.02 to 
.72), with most correlations ip the .2 to .65 range. 

lb suBUiariBe, in this experiment, the timing of movaoaent onset for 
gestures a^ppropriate to consonants was tightly linked to the tiining of 
.oovonent onsets for vowel-related gestures. This stability of relative 
ar.ticulatory timing was observed for all utterances and all speakers examined 
and wsirs Independent of often large variations in duration, displacement, and 
velocity of individual articulators. These kinematic results map rather well 
onto the earlier EMG findings (Tuller, Kelso, & Harris,' 1982) and together, 
provide evidence for relational invariance in articulation • llie independence, 
of the relative timing of movenents and muscle activities from modulations* in 
power or force appears to be an organi national scheme that speech production 
slik4^s with many other forms of coordinated activity («ee Powler, Rubin, 
HemeB> a Turvey, 1980; Grillner, 1982; l^eleo 4 Tuller, in press; Kelso, 
Tuller, A Ifcirris, in press, for reviews). In fact, it appears to be the ipain 
signature of muscle- joint ensembles when they cooperate to accomplish particu- 
lar tasks. 
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01 SmULTAlioiIS lEUBjOIIUSCUUR, KOVENERT. AHS ACOUSTIC HEASURBS OP SPEECH 
ASMCULATlOlt' 

* • ■ . * 

ThOBM Efe«r and Pstar J. Alfonso^- » 




IHTBODUCTIOH 

Speech production is a process in lAiich neuromuflcular signals are 
transferred into movements of the articulators^ these signals ar? in turn 
transferred, with coordinated acjtivity of the larynx and respiratory system, 
into the acoustic vavefoms that foim the ultimate output. As this brief 
description \im piles, there are a number of levels, at rtiich the speech 
production process oan be studied/ ELectj^omyography can be used to study the 
patterns of muscle activity, viening techniques and indirect, measures can be 
used to monitor the resulting movaments, and acoustic processing techniques 
can be used to study the final output. In addition, aerodynamic measurement 
techniques can be used • to monitor the pattelrne of pressure and airflow that 
cojQtribute to speech movements and that provide the acoustic source for the 
speech -signal • ^ 

F^w stud iea^-have measured, and compared aspects of speech production at 
several of these levels simultaneously. Die reason that such studies are 
scarce is due mainly to the fact that they are technically difficult., 
HDwever, modern advances in instrimentation have made such studies more 
feasible, and the information to be gained by collecting data simultaneously 
from several levels Justifies increased effort, toward these ends. 

lata obtained from several measurement levels simultaneously could not 
only help in obtaining a bettey understanding of the interrelationships 
between these levels, JDut also would be, helpful in interpreting the informa- 
tion in any one. Bie* acoustic speech signal depends in a complex way on the 
positions and movements of the various . articulators, and « these movements 
depend, in turn, on the activity of several muscles. Given the complexity of 
these relationships and thOjlevel of our understanding of their details, we 
c|uinot always use the measurements at any one level to /infer those, at the next 



♦A version of this manuscript ims prepared as a chapter in D. Beasleyt 
C. Prutting, T. Gallagher, and R. G. Danlloff (Eds.), Current issues> in 
language science. Vol. 11 , Mormal and disordered speech . San Diego: 
College mil, Press, 1962. 
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l0V«l. Ibr aflsplty m^asuranents* of acouBtlc formant frequencies cannot be 
reliably uaed to infer articulator m ova en t a studying speech production 
at mtJiltiple levels simultaneoibly (for example, articulator movements in 
^parallel with acoustic output) t ve gain reliable information not only ab^^t 
each level but also about their interactions, so that in the future, 
in^fotmation at one level may bacome a better predictor of inforaa)rion at the 
other* SUrtbenaore, we knov that the purpose of the speech production 
mechaniOD is to cciimunicate phonetic inforaation*. Although it Is assumed that 
the Input is in an invariant stt^ented fora, the output as realised at *any of 
these measurement levela is of ox)ntinuous and variable fotm, that is, the 
output is highly encoded. Development of a complete under stand li^g of the 
nature of this code baaed on meaaurements at any single level — acoustic, 
a^rticidatory, or neuromuscular^^has been elusive. Comparison of data at all 
of these levels should help in determining the way in which the system is 
organised to perfoxm its function of tranmitting phonetic infotmation: 

IHSTRUHENTATION ' 
* ^ ♦ 

The purpose of this section is. to revifw briefly recent advances in 
inatrumentation that make simultaneous meaaurements more feasible t)ian they 
have been In the past. .This instrumentation falla«into two general classes: 
(l) improved 'mjiiisur omen t devicea for obtaining physiological signals, and (2) 
improved* computer techniques for analysing, storing, and displaying these 
signals. - ^ * 

Measurement Devices 

Cine or video films repreasnt the most commoij source of speech .movement 
data. Hfcre, the use of computar-assisted measurement procedures, for example, 
digitising tablets, signifidanfly facilitates the extraction of quantitative 
data from theae filmic. More significantly, however, there are a number of\new 
iaatrumentatlon techniques that csa' be used to obtain movement dat^ directly, 
without the .need for- hand measur«nents of frame- by- frame records>*«.^e such 
inStruB-ent is the x-ray mic rob earn system (Rijlmura, IJiritani, 4 Isnid»>^4^^ 
Kiritani, Itoh, A PUJliBura, 1.975 )t which uses a narrow computer-st^rAd x-ray 
beam to track in real* time the movements of small metal pellets atrached to 
the articulators* The pellet positions themselves, as functions of time, are 
the output. This procedure not only simplifies the analysis of the data, but 
also reducea the x-ray expoaure to the subject, allowing^ more daja to be 
collected with greater safety. Otlfer instruments that may provide measure- 
ments of tongue movements without' the use of potentially harmful x-rays are 
being developed. Theae include magnetometera and similar field sensing 
devices (-Perkell 4 Oka, I960; Sonoda, I977)t ultrasonic measurement and 
Imaging devices (Niimi 4 Simada, 19B0; Watkins 4 Zagsebski, 1973) i nnd 
1*0 to electric devices (Chuang 4 Wang, 1978). Eynamic electro pal a tog raphe for 
real-time monitoring of tongue-palate contact patterns are now commrffcially 
available. Many of the meaaurement techniques listed above can also be used 
for*monitoritag lip and jaw movamentts. In addition, atrain gauge (Muller 4 
Abbs, 1979; Sussmarf 4 Steith, 1970-a, t970-b) and video (HcCutcheon, Fletcher, 
4 Kasegawa, 1977). techniques, have been used for these measurements. A 
commercially^^yailabjU ppto-eley3troni<f devic^e originally^ developed for moni- 
toring gait movements (Lihdholm 4 Oeberg, IW) is especially wei;. suited for 
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BonltorlQg lip and jav sovMimta by automatically (Qiaauring the positions of 
Blnlatura llshUwitting diodts attacfbtd to tbesa articulator^. 

Ibr aonitorlng larjfngeal and velopharjngoal activity, the fiberoptic 
andoacopa paraita thsr observation and aeasuraiait of moveeients during unlmped- 
ad apaeoh (SaMahiaa, Abraaaon, Cooper, ^^3%lt©r, 1970). These ©eaaui:«©ents 
Hill beoGBo BOra quantitative with tbr^evelopnent of stereoscopic Vieving 
tachniquee (fuyiMUPa, Baer, a Bil«i, 1979). Transilluaination methods, which 
aay uae the fiberoptic endoscope as a light source (LBfqvist d Yoshioka, 
1980)9 can be used to neaaure glottal novements without frame- by- frame hand 
neaaurenanta. Several "other glottoiraphic methods, most notably electroglot- 
tograpby (Iburcin, I98I) and acoustic inverse filtering (Hothenberg, 1981 ) 
have recently been improved* Ultrasonic measurment (Hamlet, 1961; Kaneko, 
Itohida, Sueukl, Komatsu, Kaneaaka, Kobayashi, A Naito, 1961) and imaging 
techniques also hold potential for future apjplications. 

Computer Techniquee 

Pi|ure 1 illustrates the magnitude of the problems aosocipted with data 
analyaia, etoragei and display of simultaneous speech ^measurements. -This 
figure shows a oaall sample of the EMO, aerodynamic, acoustic, and movtoent 
data collected during the coui^ of one experiment. Although there seems to 
be a great deal of data In this figure, in fact it represents only a small 
sample of the complete data set. Bich coluan represents a different channel. 
The top row represents the^average patterji of activity for each channel for a 
aingle tvpe of utterance. Ihese averages are calculated from ten repetitions, 
or toKensi of the utterance. In the remaining rows, we show the patterns of 
activity Vor only four of these tokens. The left coluan shows the E«G 
patterns, r^orded from a single insertion into the levator palatini muscle, 
and the second coltran fhows. the same data after smoothing. Aerodynamic and 
acoustic measures are shown in colimns three and four. The raanement data, 
shown on the rightmost dolumn, were measured frame by frame from a cine film. 
An 'experiment of this type may contain 20 to 30 different types of uttwances. 
Thus, the vol uae of data obtained from multiple- level measurements of speech 
production can be staggering and the problem of synchronising and co-analyzing 
these data is significant. Uie development and improved accessibility of 
computer processing equipment and tecHniquea contribute in an important way to 
the feaeibility of this research.. Improvements in siee, speed, and price of 
modern computers and their ^associated peripheral equipnent have greatly 
facilitated the problem of aampllng and digitising a large nunber of signals, 
bringing them intq^ synchrony, and performing analysis and display operations. 
Because of the large nuaber of signals involved in these experiments, it is 
Important to have flexible, rapid, interactive access to the data,, especially 
tor generating comparative displays, to aid in forming hypotheses about the 
relationriiipa among the signals'. The ability to submit the data to fonnal 
analysia procedures, such aa croae-correlations, is also Important for quanti- 
fying these relationshipe. - The developifent of facilities to perform these 
operatione Is greatly simplified using the hardware and software support 
available with modern computera. For some of the more difficult procedures, 
such ae analysis of the acoustic aignals and statistical analysis, application 
software can be obtained commerciJilly* « 
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Averages » shown on the top row, are based oru ten tokens. TSfie 
remaining four rows slJow the first foiir tokens. The data repre- 
*sent, from left to right, EMO, aerodynamic, acoustic, and velar 
movement measures related to the utterance. /fassmap/ , aligned around 
the /«/-/m/ boundary. The first column shows the EKO average and 
individual tokens without software smoothing. Ihe second column 
shows the same data smoothed with a 35 msec window. 
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SIWJLTAIIBOUS lEASURES 
Meaeurementa of Vocal Tract Dynaalca 

Aa an exanple of the uaefulneaa pf '.aimultaneoua measures, ve will 
consider In this section out oun research on the dynamics of voiiel production 
(Alfonso A Baer, 19B2). The purpose of this study was to examine the dynamics 
of vovel produotlon In a apeclflc environment, namely /epVp/ by aimultaneoualy 
monitoring muacle activity using electromyography, artlculatory movementsf 
using cinefluorography, and acoustic output. A alngie speaker produced 
multiple repetitions of ten vowels In the frame environment. For two of theae 
repetitions, clnefluorographlc films wore made at a rate of 60 frames per 
second. Lead pellets were gjued to the tip,' blade, and dorsum of the tongue 
and to the upper and lower Incisors to serve as discrete reference pplnts for 
measurements. Througlu>ut the run, a!0 slgnala were recorded through hooked 
wire electrodes ftom a number of artlculatory muacles. Including the posterior 
pariv^of the genloglossus muscle. Good quality acouatlc recordings were alao 
made. ^ * 

• * 

H^surements during the vocalic period . Considering first the acoustics, 
we analysed the fonnant frequency trajectories for each token andv produced a 
traditional F1-P2 plot using[ the peak fonnant' frequencies representing each 
vowel. The plot la ahown in Figure 2. Such plota are often used to infer, 
vocal tract ahape characteristics.^ However, these vocal-tract shape charac- 
terlatlca depend on the positions of several articulate ra, moat significantly 
tongue, lip, and jaw. The PI and P2 dlmenalons shown 'in Figure 2 are often 
associated with the tongue front-back and high- low .dimensions, respectively. 
These general lea tiona Ignore the separate effects of lip and daw positions 
that are usually a^umed to vary in a manner dependent on tongue poaltlon. 
That is. Jaw poaltlon la assumed to vary with tongue height, and lip 
configuration (spread- round) la assumed to vary with tongue (front-bacKi high- 
low) dlmensiqns. Ihus, the vowels /l/ and /u/ are assumed to have both high 
tongue and high Jaw positions, while /ae/ and /a/ are aasumed to have both low 
tongue and low Jaw positions. 
/ 

Analysis of the x-ray film in this experiment showed ^that tongue poaltlon 
varied as i^expected across vowels but that Jaw movements were negligible. 
Figure 3 shows the trajectorlea of the tongue doraum pellet for each vowel 
during the interval tram its voice onset until lip closure for the final 
consonant (i.e., the vocalic period). TJjfie pattern of locatlona of the 
endpointa of the trajectories grossly resembles the vowel pattern in the 
acoustic domain shown in Figure 2| altl;iough it may be noted that the 
diphthongised vowela /e/ and /o/, as might be expected, do not fit this 
pattern as well aa the remaining vowels. Thus, comparisons of acouatlc and 
clnefluorographlc measurements from this experiment show that the fomant 
frequency meaaurementa provide, in thii caae*, a reaaodable estimate of the 
position of the tongue (aa indicated by the position el? the tongue dorsum 
pellet), but^Wiat Jaw poaltlon cannot be inferred, from the\ acoustic data. 

The movant data thus show that tongue movementa did not contain any 
•components due to jaw movementa, but rather were controlled Independently 
during this experiment. Looking one level deeper into the system, we 
confirmed by recording from a jaw muacle — namely, the anterior belly of the 
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digMtrlo— that| there me little or no Jev<-related muscular activity. To 
Inveatlgate further the control of the tongue movements, we examined the 
activity of the posterior part of the genloglossus muscle, which Is. thought to 
participate in tongue fronting and bunchiiag^ ' 

A comparison of peak E^G values for the ten vowels Is shown In Figure 4* 
Ihe figure shows that there Is greater activity for the high vowele than for 
low vowels, ilmong the high vowelsi^ there Is greater activity for those with 
front than bacl^ tongue positions, as expected. Qynamlc measurements vere used 
to documimt further the function of the posterior genloglossus muscle In* 
fronting and raieling> Figure ^ shows, on the left^ the relationships between 
genloglossus S(G activity and tongue movements In the horleontal and vertical 
dimenedons during /l/. The line-up point, sero on the abscissa, represents 
the voice onset of the vocalic segment. Ihe right side of the figure^ shows 
correlation functions between the pairs of curves shbwn on the left. The 
correlation functions 'read h nearly unity at latencies of about 110 msec » a 
reasonable value for the meohanlcal response time of this muscle-articulator 
system. This result is consistent with the view that posterior genloglossus 
activity contributes to both vertical and horizontal tongue movements. Ihust 
BIG recordings from the posterior genloglossus muscle and the anterior belly 
of the digastric are consistent with the tongue and jaw movement data in that 
Jaw position is stable and that tongue position is Ind spend en tl^r controlled by 
the extrinsic tongue muscles. 

In summary, results to this point show that Q(G, movement » and acoustic 
i^eaaurements are in general agreraent with each other regarding lingv^l 
behavior during the vocalic period. Next» we wanted to consider anticipatory 
tongue movements for the vowels. * 

^ Measurements preceding the vocalic period . Considering first the acous- 
tics during the period preceding the vocalic segment, measurements in this, 
domain are obviously not very informative, since the schwa segment preceding 
the vowel is of short duration and low Intensity, making spectral analysis 
difficult, ^furthermore, no acoustic measures other than duration can bo mode 
during the stop occlusion or preceding the schwa, since there is no acoustic 
energy during these periods. ^ 

Considering movement data next, Figure 6 shows sagittal plane trajecto- 
ries for the tongue dorsm pellet for foxir of the vowels.* The time Interval 
for these plots begins at the voice onset for the schwa ^and ends at lip 
contact for the final consonant. lines forming ellipee-llke enclosures have 
been superimposed on the trajectories in Figure 6 to indicate three different 
time intervals. The trajectories during the production of ^the schwa are 
enclosed by the inner line.' The trajectories during the production of the 
bilabial closure are enclosed by the^ outer line. With the exception of /a/, 
trajectories after the consonant release appear outside the region encloeed by 
the lines. 

Considering tongue positioning during the schwa, we note that the region 
is long and flat. Anticipatory mpvement for the bac|c vowel /u/ occurs 
primarily in the horleontal direction but very little in the vertical 
direction. Ihe front vowels cluster near the left end of this region, and 
demona'trate only small movementa before the ^period of foneonantaV closuro,. 
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Figure 5^ Genioglossus EHG activity with tongue dorsum horizontal movement 
(top left) and with tongue dorsum vertical movement (bottom left) 
during /i/. Correlation functions between the EHG curve and the 
respective mov^ent curve are shown on the right. 
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Figure 6. 



Movement trajectories of the tongue' doraum pellet during the 
interval beginning with voice onset of the schwa, including the 
initial consonant and the vowel, and ending with the lip contact 
for the final consonant. Trajectories during the production of the 
schwa are enclosed by the inner solid line, during the production 
of the initial bilabial closure are enclosed by the outer solid 
line, and during the interval from the release of the initial 
copsonant to the lip closure for the final consonant appear outside 
the solid lines. 
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Within the /p/ closure region, the trajectories continue to spread horizontal- 
ly and also lower. Finally, the trajectories move toward the extremes of the 
space. 

^ ihe next two figures show the time course of tongue dorsum movements fpr 
all' Ifen vowels. First, ire consider the vertical dimension, shown in Figure 7. 
In this plot, the lUaeup point— «ero time--is the onset of voicing for the 
vowel. Implosion for the consenant occurs ^t different times depending on 
vowel type, and ranges from about 120 to 160 msec. Vertical tongue position 
curves for- all ten vowels begin to diverge from each other at about the time 
^ of implosion. Therefore, the onset of vertical ^ vowel- related movements 
appears to be time-lociced to the consonant. 

HoriBontal movements shown in, Figure 8 are difArent. Biese curves are 
separate even at the earliest time measured^^^50 msec before voice onset for 
the vowel. Hore significantly, the curves for back vowels and high front 
vowels begin to diverge-from each i>t her almost immediately. Notice that >^ile 
backward movenvents for the back vowels begin much earlier than their vertical 
movements, tlie fronting movements for front vowels begin only at about the 
same time as their vertical movements-- that is, at about the moments of 
Imjplosion. ' # ^ * 

Finally, we consider EMG data related to anticipatory tongue movements. 
!Ihe posterior genioglossus EMG data for /i/, shown in Figure 5, demopstrate 
that vowel-related EMG activity^ begins over 200 msec before th^ lineup point,' 
the voice onset, or slightly more than 100 msec before the onset of vertical' 
and horizontal tongue movements. Data for /u/ are shoVm in Figure 9. As 
indicated in Figure '4, the value of peak activity for /u/,is less than\ that 
for /i/. The timing of EMG activity for the two vowels is similar, altfiough 
the onset of activity for /u/ appears to be somewhat later than that for / 1/ . 
Comparison of Figures 5 and ^ shows that tongue vertical faovements for /u/ and 
/i/ begin ^t about the same time, but horizontal tongue movements for /\x/* 
begin much earlier. !Ihis observation is supported by a comparison of the 
correlation functions between ^1/ and /u/. Bie^ correlation functions for 
vertical and horlTOntal movements for /i/ and vertical movements for /u/ .all 
appear* roughly similar, showing a peak in the vicinity of 100 msec, while the 
correlation function for horiaontal movements for /u/ has its peak at pr 
before 0 msec and has the ^opposite sign. These results suggest that the 
posterior part of the genlpglossus muscle contributes %q fronting and bunching 
movements for these vowels, but no.t to the backing movements for /u/- 

Slmilar patterns of genioglossus activity were reported by Raphael and 
Bell-Berti (1973) for the same talker producing six of these vowels in a 
similar treme. Raphael and Bell-Berti study, in addition, reports data 

from other lingual muscles. Their data, as well as our own, demonstrate that 
the onset of genioglossus activity never preceded the onset of voiclxig for any 
vowel by' more than 230 msec Fbr back vowels, however, styloglossus muscle 
activity begins at least 500 msec before the onset of voicing. Biis muscle is 
thought to participate in tongue backing. Thus, EMG data suggest a timing 
difference for backing and fronting maneuvers for this subject. 

We can perhaps explain the difference betwe^^ fronting and backing on 
physiological grounds. At least for the high front vowels, a single muscle — 
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Figure 7. Tongue dorsum vertical movemeaits. Zero time represents the onset 
of voicing for the vowel. Implosion of the initial consonant 
ranged trom -t20 to -160 msec depending on vowel type, and is shown 
by the rectangle. 
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8. Tongue dorsum horizontal movements • Zero time represents the onset 
of voicing for the vowel. Implosion of the initial consonant 
ranged from -.120 to -160 msec depending on vowel type, and is shown 
by the rectangle. 
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Figure 9. Genioglossus EHG afctlvity with tongue dorsum horizontal movement 
(top left) and with tongue dorsum vertical movement (bottom left) 
during /u/. Correlatioi\ functions between the EHG curve and the 
respective movement curve £tre shown on the right. 
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namely the genioglossus — may be primarily responsible for moving the tongue 
both fonmrd and upward • On the other hand, tongue backing is achieved by 
muscles other than the genioglossus--for example, ^e styloglossus. Thus, 
backing movi^ments could occur independently from verjbical' movements in high 
back vowels. " ' ^ | 

Why the timing of vertical movements should be different from that of 
horieontal movements cannot be determined ^rom the above data alone. Several 
explanations are possible. Oto physiological grounds, it may be that backing 
mcTvements must begin earlier because they are intrinsically slower than 
raising and fronting movements. On perceptual grounds, anticipatory vertical 
and horizontal movements may be necessary in that*^they spread phonetic 
information across neighboring segments. However, in this context, there may" 
be physiological contraints that restrict anticipatory vertical tongue move- 
ments. Other explanations might, rest oji acoustic/aerodynamic grounds. The 
point we wish to emphasi«e, however, is that the conclusions about differen- 
tial control of tongue horicontal and Y^^^^cal movements could not have been 
reached without simultaneous movement and EMG measurements. 

Measurements of Laryngeal Function 

Phonatory function . Simultaneous measurements are particularly important 
in studies of laryngeal function. In studies related to phonation, the 
relationships among acoustic output, vocal-fold vibration patterns, aerodynam- 
ic conditions above and bslow the folds,* and patterns of muscle activity are 
Imperfectly understood. It is important to make these measurements simultane- 
ously in order^to understand the phonatory mechanism better (Baer, 1961). In 
addition, because of the anatomical complexity ef the larynx and its inacces- 
sibilty for measurements, most of thd desired information cannot be obtained 
directly, but must be inferred from indirect measurements. Ihere are a number 
of complementary methods ftr monitoring phonatory vibra'tions, each of which 
provides only partial information. TSaken together, however, they significant- 
ly increase our understanding of phonation. Figure 10, for example, sho-wa 
simultaneous signals obtained by acoustic recording, by electroglottography 
(EGG), 'and by transillunination, or photoglottogrsphy (PGG), during sustained 
phonation of the vowel /i/ at varying intensities. The acoustic signal 
provides information about the pattern of airflow through the glottis, but 
this infonnation can be obtained indirectly only after the signal has been 
filtered by its passage through the vocal tract, and it is thus difficult to 
interpret. The electroglottographic signal contains information mostly about 
the closed period, while the photoglottographic signal , contains information 
mostly about the open period. Jlcross the three level^ of intensity, the PGG 
signal shows an inconsistent pattern of changes, but the EGO signal shows 
systematically sharper deflections. A ccmparieon of the EGG with PGG shows 
significantly less time overlap as the intensity is increased. This evidence 
can be interpreted as showing the the closing of the glottis occurs more 
abruptly and with less phase difference along its anterior-posterior extent 
with increases of intensity. Together, the signals thus contain significantly 
more information about the mechaniam used for varying intensity than could be 
obtained from any 6ne of them alone. 

While acoustic measurements can reveal some informati^bn about t)ie larynx 
during phonation, they contain little information about the state of the 
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Figure 10. Simultaneoua glottographic wavefoma of phonption at three differ- 
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signals are plotted with transcend uctance ( repreaenting contact 
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larynx during the production of unvoiced -speech segments. For example, It 
cannot tte determined froin acoustic analysis* alone idiether the glottis Is open 
or closed during '^volc el ess periods. Tbua^ measurements of laryngeal function 
In any single domain hiave limited value* Furthermoret simultaneous measufe- 
ments are particularly important to understand the coordination of laryngeal 
with respiratory and articulatory Infonnatlon. 

Laryngeal behavior of stutterers Laryngeal function in stuttering has 
beeo a subject of considerable reseat*ch interest, in recent years. Many of the 
studies in this area have concentrated on voice onset time and laryngeal 
reaction time (Adams A Hayden, 1976; Cross A Luper, >979; Cross, Shadden, A 
^Luper, 1979; Reich, Till, A Goldsmith, 1981; Starkwseather, Hirschman, A 
T&nnenbaumr 1976; Watson A Alfonso, 1982). More generally, these studies 
concern transitions between unvoiced and voiced states in speech and nonspeech 
environments. Many of these studies have been based entirely on acoustic 
. measurements 1^ and have concentrated on measurement of acoustic latencies such 
as voice onset time or voice initiation and termination time. These measures 
are useful in identifying differences between stutterers and their' controls. 
However,* acoustic measures alone have limited usefulness in identifying the 
nature^f def^its that contribute to any such differences. In the following 
section, we nil consider a series of our own e^^periments based on acoustic 
measures, and will indicate how measures of laryngeal movements and EH6 could 
contribute to the interpretation of the results* 

In many of the experiments designed to investigate laryngeal function in 
stutterers that have been documented in the literature, subjects are asked to 
respond to a stimulus by initiating phonation as rapidly as possible* Using 
this experimental paradigm, Adams and Hayden (1 976) and Starkweather et 
al. (1976) were the first to demonstrate that stutterers, as a group, have 
longer onset latencies than normal speakers. Recently, the slue of the 
latency has been found to vary with stuttering severity (Alfonso, Watson, A 
; Husso, 1981; Borden, 1981). While these experiments are useful for identify- 
ing group differences, they give little insight into the cause of the 
differences. Our own experiments can serve as an example. In our initial 
study (Watson A Alfonso, 1982), we followed procedures similar to* those of 
Adams and Hayden (1976) and Starkweather et al. (1976) except that subjects 
were first presented witlya warning cue, and after a variable interval of one 
to three seconds, were i&esented »with a cue to phonate. The interval between 
the warning cue and the phonate cue is referred to as the foreperiod. In this 
first experiment, stuttet'ers rated in severity as mild to moderate and. normals 
did not have significantly different response times. This led us to speculate 
that foreperiods of one to thrfte seconds could apparently be utiliced by 
subjects in the stuttering group to prepare for the upcoming response so that 
they could then perfom.the remaining initiatory movements with the same 
latency as nonstuttering subjects* We further speculated that the group 
differences reported in other experiments were associated with abnormal 
preparatory activity associate^ with the voice onset rather than with the 
initiation of voice* We conducted further experiments to test this hypothesis 
by extending the durations of the^ foreperiods from 1CX) to 3000 msec (Alfonso 
et al*, 1981)* The results^ are shown in Figure 11. At short foreperiods, 
when there is little preparation time, mild and severe stutterers have similar 
reaction times, and these times are significantly longer than those of 
nonstutterers* At longer foreperiods, mild stutterers are significantly 
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Figure 11 • Acoustic laryngeal reat^'tlon time (LRT) in msec is shovn on the 
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data point represents the average of five responses per subject 
pooled across the five subjects in each group. Also ^own are 
single standard deviation dispersions above and below the mean. 



ERIC 



1 



107 



Qa SlaultanMUs RturoauBoulftrt MovoBMtt and Aoouatlc Neasurea of v 
Spaaoh Articulation / 

faator than aavare etuttarare. Tharafora, the praparation time afforded by 
the forepariod allowd mild atuttarara to phonate nearly aa quickly aa ndxnal 
apaakara* vhareaa praparation timaa even aa great as three, aeconda did not 
allov aavere atutterera to reach nomal valuaar Va hypotheaiaedt on the basis 
of the acoustic data alone, that mild atutterera* delayed onset latencies are 
priaarily related to laryngeal poaturing difficulties vhereas severe, atutter** 
era* delayed onset latencies are related to acme undetermined combination of 
poaturing and vibratory initiation components of phonation* 

Vbila va believe that ve have puahed acoustic meaaurements to their 
llmita with raapact to characteriaing the differencea bettmen subject groups 
in reaponding to the reaction atimulust there rmain many unanswered queationa 
'with raapact to the atratagiea that aubj4cta uae for preparing their reaponsea 
and fioally initiating ^vocal fold vibrations. For instance, is the coordina- 
tion between laryngeal and reapiratory activity different? Vlhat are the 
effecta of aupr a* laryngeal articulationa on stutterers* delayod onset laten- 
ciea? Of course, we are intereated in characterising the laryngeal contribu- 
tion to the delay in phonation. For tnatance, aa suggested by the results of 
Freeman and Ushijlma (1978), it may be that some atutterera simultaneously 
contract abductor and adductor muscles so that they are unable to poaition the 
vocal folds for phonation until they Achieve appropriate control over theae 
muscles. Some stutterers may be able to position the vocal folda suocessful- 
ly, but delay the initiation of vibration due to an inappropriately high level 
of vocal fold teriaion, perhapa by inappropriate levels of cricothyroid or* 
vocalia muscle activity. . To inveatigate questions at these levels, more 
direct mea^urementa of reapiratory, laryngeal, and articulatory behaviora are 
reqtdred. For inatance, to inveatigate further the "poaitioning** veraus the 
••initiation of vibration" hypothesis, we plan to complement acoustic data with 
movement data from high-^apeed filming and tranaillumlnation, and with EMG data 
from laryngeal adductor and abductor muacles. Only through simultahaoua 
meaauremanta taken from acouatic, movement, and EMG levels can a fuller 
description of abnormal laryngeal control be ultimately underatood. ^ 

^ COMCLUSIOH - ^ 

In thia paper, we have preaented evidence that acoua^tio meaauremanta 
alone, or in fact meaauremanta in any aingle domain, often provide incomplete 
inforaation in atudiea of apeech production. While we realise that some of 
the meaaurement tachniquea are prohibitively complex and expenaive, modem 
developaenta have increased the repertoire of instruments that are i'inancially 
and technically acceasible to a typical speech laboratory and that can be uaed 
without medical auperviaion. Btamplea of these are eurfaoe electromyography 
for the lips, measurements of airflow and preaaure in the upper vocal tract, 
alec tr og lot tog raphy, and atrain gauge meaaurement of. lip and jaw movamenta. 
Other instrumenta auch aa opto-electronic movement transducers (a*g*^ Sol- 
spot), dynamic electropalatogTraphy, and ultrasonic devices are more ej|ponBive 
but easy to use. With the cooperation of a physician, the repertoire can be 
Increased to include techniques that Include mildly invaaiva procedurea. Uiey 
include booked-wire electromyography of the articulatory musclea, cineradiog- 
raphy, flberoptld endoaoopy and the aaaoclated procedure of tranaillumlnation. 
Although, theae procedurea may require the cooperation of a medical doctor, 
they'^do not require a high degree of apaclallsed medical training. 
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There remain some oieaaurment techniques that cannot be performed without 
conpLez and expensive equipment, such as the ccmputer controlled x-ray 
aidrobeam system motioned above, or the assistance of a specially trained 
physician to perform procedures such as electromyography of the laryngeal 
muscles. Qaly a fev research centers throughout the country are presently 
equipped to perform experiments at this level, and as technological advances 
make even more complex equipment available^ it is not likely that more than a 
few ],aboratories will ever be able to purchase, maintain, and operate the 
laboratory equipment of the future. We have argued in this paper that the 
infomation gained from « simultaneous measurements is worth the difficulties 
associated with making them. The complexity of experimentation, and the value 
of coordinated measures, taken together, argue for the support of at least 
acme centralised laboratories, which maintain appropriate facilities for 
cooperative experimt^tation. 
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THE RELiTIOI BEfVEKl FROHUHCUTIOH AHD RECOGNITIOH OF PRINTED WORDS 
II IBEP AID SHALLOW 0RTH06RARIIES 

Leonard Kats-^ and Laurie B. Feldman^^ 



Abstract * The processes responsible for recognition and 
pronunciation of printed words, were studied by means of lexical 
decision and naming experiments. Tiro languages n^re examined: 
English, which has a complex and^ deep corresponjrence between 
spelling and speech, and Serbo-Croatian, in ^ich the correspondence 
is simpler and more direct. It was hypothesized that reliance on 
phonetic coding would be greater for Serbo-Croatian because its 
shallow orthography would allow more efficient use of spelling-to- ^ 
speech correspondences. Bach target stimulus was preceded by a wor^* 
that was either related or unrelated semantically. Semantic priming 
of target words facilitated performance in both lexical decision and 
naming for Baglish, suggesting an influence of the internal lexicon 
on both processes. In contrast, semantic priming facilitated only 
lexical decision for Serbo-Croatian, suggesting that naming, at 
least in that language, %& not strongly influenced by the internal 
lexicon. Further, in Serbo-Croatian, lexical decision and naoiing 
latencies were correlated only when both tasks were not semantically 
primed and were uncorrelated when either or both tasks $j|ip^iVed 
semantic priming. This suggested that phonetic coding is tised in 
lexical decision, at least under conditions >diere contextual 
semantic facilitation is absent. In contrast, in Baglish, lexical 
decision and naming were correlated uniformly whether semantic 
facilitation was present or not, which, when considered with the 
effect of semantic facilitation on naming, suggested a stronger 
influence of the internal lexicon on both recdgnition and 
pronunciation. ^ * 

The present experiment is concerned witlv t^e relation between word 
pronunciation and word recognition. The aljiiabet is, of course, the primary 
lool for specifying the pronunciation of written words; children are instruct- 
ed in its grapheme- to- phoneme correspondences irtien they are taught to read. 
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Th% Itelatlon Bitmen Pronunolatlon and JKtcognltlon of Printed Words 
in Beep and Shallow Orthographies 

Young readers dewonstrate this knowledg^^y reading aloud and, particularly, 
by sounding out vords that are new to them. Even so, skilled reading involves 
silent reading, and it is not cl?ar to what extent phonetic coding still 
mediates word recognition for the skilled reader. 

♦ 

A related question concerns the pronunciation of familiar words. Does 
the skilled reader pronounce these words directly by means of spelling-to- 
speech correspondence rules (as the beginning reader might) or, instead, is 
the pronunciation accessed as a stored lexical memory along with the meaning 
of the word? In other words, is pronunciation mediated by the internal 
lexicon. 

tm 

Ihe correspondence^ between English orthography and speech is highly 
abstract (involving complex rules), because the orthography principally refer- 
ences the moridiophonemic level of Ehglish (Chomsky & Halle, 1968). It has 
been argued, therefore, that faster word recognition will occur with a 
strategy that avoids phonetic mediation. According to this argument, then, 
languages with different degrees of complexity in their spelling- to-speeph 
correspondence should show appropriately different degrees of dependence on 
phonetic coding. In particular, readers should utilise phonetic coding more 
often when reading an orthography that has ,a more direct correspondence 
between grapheme and phone than does Baglish. In addition, because phonetic 
coding may be easier for readers of a more direct orthography, these readers 
may depend less on lexical mediation for the pronunciation of printed words. 
Instead, the simpler spelling-to-speech correspondences may be more efficient 
I, (in terms of speed of access and storage space) than a lexically mediated 
system. We are suggesting, then, that a reader's use of phonetic coding for 
either word recognition or pronunciation or both may depend, in part, on the 
nature of the relation between the ortho^aphy and the spoken language. 

The present iSxperiments test these notions in two ways. First, we 
compare the prpeesses of pronunciation and word recognition in English (with 
its deep orthografdiy) and Serbo-Croatian, a language whose shallow alphabetic 
orthography was designed in the last century on the principle, "Spell it as it 
sounds; say it as it is written." The spelling- to- sound correspondence is so 
consistently simple that even minor dialectal variation in the speech is 
mirrored in the orthography. 1 Secondly, we attempt to manipulate the degree 
of lexical mediation by varying the semantic relation between a prime and the 
target stimulus on each trial (e.g., the stimulus to be either pronounced or 
recognised). If the internal lexicon is involved in pronunciation as well as 
In recognition, then there should be an effect of semantic priming on both. 
For ^glish, we expect that lexical decision and naming will l^oth be affected 
by semantic priming, showing that naming is, to some extent, lexically 
mediated* For Serbo-Croatian, on the other hand, we expect that lexical 
dec^Lsion, but not naming, will be affected by semantic priming, showing that 
naming occurs without lexical involvement. The most likely basis for a pre- 
lexical naming rosponse is a process based on spelling- to^^ speech correspon- 
dences, i.e., a process culminating in a phonetic code. Thus, we have a basis 
for assessing the notion that the complexity of the relation between orthogra- 
phy and phonology will determine a skilled reader's reliance on phonetic 
med^iation. 
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In tha praaant axparijianta, lexical deciaion and naming taaks are used to 
atudy vord recognition and pronunciation, respectively. The experimental" 
rationale ia aimilar to that uaed by Forater and Chambers (1973) and consists 
of tm parts, firatt because the aame vords are presented in both lexical 
d^iaion and naming, the relative reaction timea among words can be compared 
betvaan taaka; a poaiti\e correlation. between taaks indicates a commonality of 
origin for lexical deciaiona and naming. Conversely, a sero correlation 
auggeats that the lexical decision and naming processes are independent. If a 
positive correlation is found, an attempt can be 'made to determine the causal 
direction of the variables in the correlation. X positive correlation could 
mean that naming mediates lexical decision, that lexical decision mediates 
naming, or that both are determined by a third factor* This ambiguity can 
potentially be reaolved in the second part of the approach, in which a 
variable is manipulated that affects lexical search but, putatively, shoxild 
not affect any phonetic receding that precedes lexical search. 

Porster and Chambers (1973) found a moderate correlation between reaction 
timea in lexical decision and naming (r- .55). This suggested that the two 
taaka had aubatantial commonality. The authors believed that word frequency 

idetermined the underlying organisation of the internal lexicon and, therefore, 
ahould affect thoae processes that were dependent on lexical^ access. Because 
Fbrster abd Chambers considered word frequency to be a principle of lexical 
organisation exclusively, they interpreted a word frequency effect in the 

' naming task (high frequency words were named faster) as evidence that naming 
is lexically mediated. Lexical mediation for naming effectively precludes the 
first of the possibilities, described above; that is, if lexical access 
precedes the phonetic processes leading to the articulation of a printed ward, 
it is unlikely that the code a reader uses for input to the lexicon would be 
an articulate ry code. Forster and Chambers* results suggested that the 
specification for pronunciation is stored in memory and is accessed along with 
a word's meaning. Ttiey report some internal experimental assessment of the 
assumption that word frequency is a variable that affects lexical access but 
not pre- lexical processing. 

In the present study, we chose semantic priming as a manipulation that 
should affect lexically mediated processing but should not affect pre- lexical 
processing. Other investigators have demonstrated, in Ehg^ish, a facilitating 
effect of semantic context on both lexical decision arid naming (Becker & 
Killion, 1977; Meyer, Schvaneveldt, 4 Ruddy, 1975), which suggests that, for 
Ehglish, the naming task involves at least -some mediation by the internal 
lexicon. However, because none of these investigators presented correlations 
between the two tasks, we do not know the extent of processing similarity. 
For Serbo-Croatian, no previous data exist that indicate aemantic facilitation 
of either lexical decision or naming. 

In summary, we tested two hypo theses concerning the role of phonetic 
coding in lexical decision. First, we tested the hypothesis that phonetic 
coding precedes lexical access in word recognition by looking for (l) the 
absence of semantic priming effects on naming, and (-2) a positive correlation 
between lexical decision and naming. The second hypothesis we tested was the 
notion that readers' reliance on phonetic receding for lexical accedes is 
directly related to the simplicity of the correspondence between the orthogra- 
phy and the claaaical phonemics of their language. Thus, readers of Serbo- 
Croatian (a language that haa a simple, shallow orthography) should depend 
more on phonetic coding than readers of Qiglish. 
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MBTHOD 

Subjects 

Fifty-six students from the Faculty of Philosophy at the University of 
Belgrade and 67 students from the University of Connecticut participated in 
the experiment in partial fulfillment of requircanents for a course in 
Introductory P&ychology. All Yugoslav subjects had jSkrticipated previously in 
reactioxl ^time «cperiments, but the American subjects, in general, had not. 
All subjects were native speakers of their respective languages. There were 
14 Yugoslav subjects within each of fouy experimental conditions. The number ' 
of English subjects in each group varied between 16 and 18; 72 subjects were 
tested, but the data of five were excluded due to error rates excereding 15%. 
Ro Yugoslav subjects approached this error rate and no data were excluded. 

Stimuli 

Target words were 59 nouns in Ehglish and 59 nouns in Serbo-Croatian, all 
judged to be familiar to college students. The two sets of nouns contained 
largely words that were mutual translations. For both languages, the length 
of target words v,aried from four to .nine letters. Fifty-nine English 
peeudowords and 59 Serbo-Croatian peeudowords were generated from the real 
wrds by changing two or three letters of each word. Vowels were substituted 
for vowels and consonants were substituted for consonants. For each word, a 
semantically related priming word was selected such that this prime represent- 
ed either a synonym or a superordinate semantic class for the target word. 
Peeudowords were also paired with primes that were not related to the 
pseudonords in any obvious way. Stimuli were typed in the Hoq^an alphabet in 
the center of 35 nnn Prime U Film slides. 

Three experimental lists were composed for each language. ^*One list (used 
for the "semantically related prime" condition) contained 59 prime- target word 
pairs, each of which nas semantically related, and 59 prime- target pseudoMord 
pairs. Also, two lists consisting of semantically unrelated words were 
constructed for purposes of generality. Both contained the-same prime- target 
pseud owrd pairs as in the semantically related list but different prime- 
target nord pairs. The sequence of target words was constant for all three 
lists. 

Procedure ' 

Subjects received either a "^mantically related" or a "semantically 
unrelated" list. In both conditions, a prime was presented for 300 msec in 
one channel of a three-channel Scientific Prototype Model GB Taq his to scope. 
After the prime, a lighted blank field appeared for 300 msec, and then the 
target item was presented in another channel for 3000 msec. A setiuence of 28 
practiceitems, identical for all experimental groups, preceded the experimen- 
tal sequence. In practice, the relation of prime to target was semantically 
neutral. 

In the lexical decision task, subjects had to decide whether the target 
was a word and indicate their responses by pressing one of two telegraph keys. 
In the naming conditions, subjects were required to pronounce each target word 
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or peauioiiord as quickly and as distinctly as possible. Reactipn time was 
aeasured trm the onset of the target nord by a voice-operated Schmitt trigger 
relay* Ja order to inst^e that subjects were reading the primes, they were 
asked by the experimenter to report the prime item. The inquiry immediately 
followed the subject's response. Inquiries occurred quasi- randomly with at 
least one inquiry within a run of ten target items. Subjects were almost 
always able to report the prime. 

In suoAary, orthography (Serbo-Croatian/Ehglish) , task (Lexical 
Decision/Naming), and prime condition (Semantically Related/Unrelated) were 
between- subjects variables. All four groups within a given language received 
the same 59 words and 59 pseudowords as targets. In the semantically related 
condition, the word targets were preceded by semantically related prime words, 
and the pseudowords were preceded by (necessarily) unrelated prime words. In 
the unrelated condition, the same prime words were reordered randomly so that 
there was no obvious semantic relation between each target and its prime. 

RESULTS 

Errors 

Mean error percentages are presented in Table 1. In the lexical decision 
task, error rates are low in all experimental conditions but are slightly 
higher for Ehglish than for Serbo-Croatian. In the naming task, most errors 
were made in pronouncing Ehglish pseudowords. The error rates in the other 
conditions are low. Nearly all errors were mispronunciations or incomplete 
utterances (e.g., only the first syllaljle of a multisyllabic pseudoword; . 
There were a few omissions of an entire pseudoword. A liberal criterion was 
used by the experimenter in Judging the acceptability of a pronunciation. If 
the pronunciation appeared to be based on an analogy with a real Ehglish word, 
or was otherwise reasonable according to common pronunciation rules, it was 
accepted. Furthermore , slight hesitations or slurring of sounds within the 
pseudoword were not counted as errors. Thus, most errors consisted of 
consonant substitutions. In cases of doubt, the experimenter transcribed the 
subject's response, and consulted the first author. 

Analyses of variance , were performed ^or the two tasks, using the error 
percentage on words and pseudowords for eath sObj^t^. For the lexical 
decision task, only the overall difference between Ehglish and Serbo-Croatian 
was significant F(l,58) - 10.96, MS^ . .0009, J2 < .01. The difference 
between the two languages was also significant in the analysis of variance for 
the naming task, F(1,58) - 11.86, MS^ . .0(^, ja < .01, and, in addition, the 
difference between words and pseudowords was significant, £(1,58) - 47. 79, 
MSq - .0014, 2 ^ •00^* ^® three-way interaction between orthography (English 
vs. Serbo-Croatian), word- pseudoword, and semantic relatedness was marginally 
significant, F(l,58) - 4.34, MS© - .0014, P- .04, reflecting the presence of 
a slight simple interaction betweeh semantic ;^atedness and word-pseudowoyd 
for aiglish but not for Serbo-Cr^oatian. Most importantly, the interaction 
between orthography and word- pseudoword was strongly significant, 
F(1,58) - 19.67, MSe - .0014, P < .001, consistent with the observation made 
above that the highest error rate occurred for Ehglish pseudowords. 
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Table 1 



Mean Error Percentages for ifbrd and Pieudovord Targets as a Function 
of the Seaantic Relation Batmen Prime and Tiarget Word. 



Lexical Decision 
Unrelated Related 



Naming 
Unrelated Related 



Serbo«*Croatian 



/ 



Pseud oword 
Word ' 



2 
1 



5 
3 



5 
4 



Pseudoword 
Word 



4 
5 



3 
3 



10 
4 



12 
2 



Reaction Tiaes 

Mean reaction times nere calculated for correct responses on word trials 
and paeudovord trials. Figure 1 presents the mean reaction times for the 
lexical decision and namiqg tasks. Inspection of the figure suggests that, 
for both Baglish and Serbo^^Croatian readers, lexical decisions to words were 
facilitated by semantically related priming. However, for the naming task, a 
different result obtains. Fbr Serbo-Croatian readers, word naming is not 
facilitated by semantically related priming, while for Ehglish readers, the 
naming task results are similar to those of the lexical decision task in that 
both are fkcilitated by semantic priming. For pseudowords, a seemingly odd 
result was found. Ihe pattern of results parallels that for th^ words; 
semantic facilitation for both Baglish and Serbo-Croatian readers in lexical 
decision but semantic facilitation for only the Biglish readers in naming. 
This apparent anomaly— semantic facilitation for jpeeudowords— will be dis- 
cussed later. 

Comparison of error rates from Tiable 1 with reaction times from Figure 1 
does not suggest any systematic relation between the two measures. In 
particular* there is no evidence for a speed-accuracy tradeoff. 

Analyses of variance for the lexical decision and naming tasks were 
performed on the mean reaction time of correct responses both for (a) each 
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Figure 1. Reaction time in allllseooods for Mord targets primed by semanti- 
cally related or unrelatMWords and for peeudomrd targets preced- 
ed by control vords. 
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^Stimulus ,±tnm averaged over subjects (stimulus analysis) and Cb) each 
subject's word and pseudoword trials (subject analysis). For t^e lexical 
decision task, the only significant factors were (l ) vord vs. pseydoword, mln 
f *(1 »127) - 37.88, ^< .001 and (2) semantlcally related vs. semantlcally 
unrelated priming, min P*(1,59) -'4.69, JB • -05. Fdr the naming task, the 
significant factors weFe (l ) semantic relatedness, min P* (V,60) - 4.91 , 
2 - •Q?» snd more importantly (2) the interaction of semantic relatedness and 
orthograiE4iy (English vs. Serbo-Croatian), min P*(l,60) - 6.092, .02. In 
addition, the naming task analysis produced significant effects for (3) word 
vs. pseudonord, min F*(1,167) - 108.71, P < .001 and (4) the interaction of 
orthography and word-pseudoword , min P (1 , 169) - 1 1 .seg, J! < •001*. These 
results suggest that semahtlcally related* priming aids Biglish readers in both 
word recognition (lexical decision) and word naming but, for Serbo-Croatian 
readers, semantlcally related priming aids only word recognition. 

Correlations 

Ihe suggestion of a similarity between lexical' decision and naming for 
Biglish readers but not for Serbo-Croatian readers receives further support 
from carrelations calculated between lexical decision and naming.'' Mean 
reaction times were calculated (averaged over subjects) for each of the 39 
words and 39 pseudowords la each of the four experimental conditions within 
each language, i.e., for the semantlcally related and semantlcally unrelated 
treatment conditions in the lexical decision and naming tasks. Table 2 
presents these Intercorrelatlons. In addition to the corrcrlatlons between 
conditions within each language, we have Included correlations between English 
and Serbo-Croatian. These latter correlations are based on each Item's 
ordinal poaition in the list of trials, i.e., the first item on the Biglish 
list was paired with the first item on the Serbo-Croatian list, etc. These 
correlations are included because they given an index of the covariation 
between conditions due to secondary sources such as practice, fatigue, etc., 
and 80 provide a baseline against which the other correlations may. be 
SYaluated. Cprrelatlons based on mean, reaction time for each of 39 words in 
each of the eight experimental conditions are entered above the diagonal in 
the eorrelatlon matrix. Below the diagonal are the correlations based on the 
mean reaction time for each of the 39 pseudowords in each of the eight 
experimental conditions* All correlations have 37 degrees of freedom; corre- 
lations above 0.26 are significant, ^ < 03* 

Pseudoword Correlations 

Por pseudowords, some strong correlations obtained. In . both Serbo- 
Croatian .and Bagllsh, correlations between semantlcally related and unrelated 
conditions were high for the naming task (r - .82 and r - .85, respectively). 
Pbr the lexical decision task, the same correlations were lower but still 
substantial (r - .57 and r - .68). These high correlations indicate, for both 
languages, a strong consistency within tasks in the processing of pseudowords. 
They indicate that reliability was sufficient to produce substantial correla- 
tions, nevertheless, four between-task correlations for Serbo-Croatian were 
nonsignificant, suggesting that there was little or no commonality between 
lexical decision and naming in the processing of pseudowords. In contrast, 
two of the four between-task correlations were statistically significant for 
aigllsb. The correlation between the related prime conditions for lexical 
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Table 2 



Correlations of Iban Stimulus Item Beaction Time Betneen Semantically 
Itorelated and Belated Priming Conditions in Lexical Decia^on and 
Kaming Tiisks for Serbo-Qroatian and Biglish Readers, (Correlations 
for »rds are entered above the diagonal and correlations for 
pseudovords are entered below the diagonal.) 
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d^lslon and nnlng was •34 and the correlation betneen related lexical 
decision and unrelated naalng vas .JS. Dielr difference vas not statistically 
significant. Haver tbel ess, only the larger correlation was significantly 
diffSrent froM its Serbo-Croatian counterpart (r" .01 )• Thus, there is 
strong evidence that pseud ovords were processed similarly vlthin tasks, 
whether or not the experlnental manipulation Involved semantically related 
prising. In addition, there is no evidence to suggest processing similarities 
between tasks for Serbo-Croatian. Finally, there is equivocal evidence 
suggesting soae between-task commonality for Boglish peeudowords. 

Word Correlations 

Of major Interest are the four correlations between lexical decision and 
naming for words. Ibr Serbo-Croatian readers, only one of these is signifi- 
cant: the correlation between the two conditions in which the prime wets 
Mfjantically unrelated to the target (r - .32). OSiis correlation is as strong 
^ that found by Fsldman (1961) and is about as strong as the correlations 
within tasks (i.e*, between semantically unrelated and. related priming for 
lexical decision, r ■ .33» and for naming, r ■ •31)* Otherwise, the remaining 
correlations between tasks are nohsignlf leant. Ilhus, the commonality between 
lexical decision and naming changes as a function of the semantic relatedness 
between prime and targets. The similarity between tasks is strongest when 
there is least involvement of the internal lexicon, that is, when there Is no 
semantically related priming. The process of word recognition is most like 
the process of word naming idien subjects cannot use semantic coding as an aid. 

A quite different pattern of •correlations was found for the Ehglish 
readers. Here, the correlations between lexical decision and naming were' all 
significant, although only of moderate siee, ranging from .30 to •44* There 
are A> statistically significant differences among them nor do they differ 
sfatistically from the only significant Serbo-Croatian correlation between 
tasks (r' .32). Thus, in eontraet to Serbo-Croatian, lexical decision and 
naming in* Bnglish share a moderate amount of processing commonality among all 
experimental conditions. This commoziality is not affected by the semantic 
relatedness between prime and target. 

The differences between Serbo-Croatian and Ehglish in the else of the' 
correlations did not appear to be due to artifacts related to differences in 
the variances of the contributing variables. Inspection of the standard 
deviations of the sixteen variables lAiose correlations are given in Table 2 
Indicated general boitogeneity. In addition, not all of the critical compari- 
sons discussed above could be attributed to any heterogeneity that did exist. 
For example, the standard deviations for semantically related and unrelated 
word naming, respiectively, were 49 msec and 93 msec for Serbo-Croatian and 34 
msec and 32 msep for Ehglish, but the correlation for Ehglish was by far the 
larger (.68 vs. .31) in spite of its having a smaller standard deviation for 
semantically unrelated naming. 
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DiscassioH 

¥ord Haalng 

Hheoe raaulta addraaa, noat directly, the queatlon of the mechanlam bjf 
^Ich printed vorda are^ pronounced. For Eogllah, the word naming proceaa 
appeara to be mediated , at leaat In purt, by the Internal lexicon. The major 
evidence that aupported thla .atiggeatlon waa the finding that no rd -pronuncia- 
tion vma facilitated idien the target vord vaa preceded by a semantlcally 
related word. Thla reault la direct evidence of lexical involvement because 
semantic relations between words are viewed as an exclusive property of the 
lexicon. Secondly, there was correlational evidence consistent with the 
hypotheala of lexical involvement in pronunciation; naming latenclea and 
lexical declaion latencies were not uncorrelated . Because the lexical decl- 
alon task requires the subject to acceas his or* her internal lexicon, the 
absence of a positive correlatlpn would have been inconsistent with the major 
finding. 

The present results are in agreement with studies of Becker and Kllllon 
(1977) and Meyer et al. (1975)» both of irtiom found aemantlc priming effecta 
on word naming in Bniglleh. In addition, the argument for lexical involvement 
in pronunciation ia atrengthened by the atudles of Porster and Chambers (1975) 
and Prederlksen ^d K^oll (1976), who found word naming latencies to be 
affected by word frequency, a putative lexical factor. Nevertheleaa, none of 
the data we have discussed indicates th%t lexical mediation la the sole 
mecbanlam for pronouncing printed Bigllah words. It Is obvlouei that pronunci- 
ation in Ehgllsh "Is no t^ always accomplished solely by lexical look-up; 
application of aoiie spelling- to- speech correspondences must be applied ^ at 
least to new words. Further, Baron and Strawson (1976) presented data 
* supporting the suggestion that pronunciation in Baglish Is accomplished, even 
by skilled readers,* by using the two mechanlams of lexical mediation and 
spelling- to- speech correspondence rules. Recently Havon and Shimron (1961) 
demonatrated that grapheme- to- phoneme coding Is typically used in naming, at 
least In part, by readers of Hebrew, despite the Hebrew orthography, whose 
design would sees to favor an alphabetic principle (i.e., grapheme- to- phoneme 
coding) even less and a lexical mechanlm even more than the orthography of 
Engllah. # 

In the present litudy, we compared the Eagllah orthography, which has a 
deep, complea correspond mice to speech, with the Serbo-Croatian orthography, 
whose simple, direct correapondence to speech constitutes an extreme ajppllca- 
tlon of the alphabetic jn^inclple. The queatlon of Interest was whether the 
degree of lexical mediation found in Ehgllah word naming would also be found 
in Serbo-Croatian, or. Instead, lexical involvement would be reduced in Serbo- 
Croatian because of the more efficient spelling- to- speech correspondence in 
that orthography. The data clearly aupported the latter alternative; semantic 
priming did not facilitate Serbo-Croatian word naming./ Also, with one 
exception (discussed below), pronunciation latencies were uncorrelated with 
lexical deolalon latencies, further supporting the notion that lexical media- 
tion plays a lesser role In naming in Serbo-Croatian than In Bigllsh. 
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¥ord Laxlcal Dtoialon 

Tho Biajor questions «sked about vord recognition ver^ vhether it ii9 
nedlated, at least in part, by phonetic coding, and if so, whether the 
influence of phonetic coding is greater for the Serbo-Croatian orthography 
than foi^ aiglish. For English, there vas no evidence in support of a 
aedlating phonetic process. This is consistent with previous results that 
offer little support for the use of phonetic codes in skilled word recognition 
in aiglish (see McCusker, Hillinger, A Bias, 1981, for a review). However, 
for Serbo-Croatian, the results ^suggested that phonetic coding precedes word 
reoognition, at least aonetliaes. Although a facilitating effect of semantic 
prising occurred for both Biglish and Serbo-Croatian (and, therefore, indicat- 
ed at least soae Involvenent of *the Internal lexicon for both), the two, 
orthographies differed l«portantly in the pattern of correlatioiis between 
lesloal decision and' nealng, suggesting some coding di?fferences. For Ehglish 
there were moderate si sed correlations between the two tasks , but the 
correlations did nc^t vary as a function of the semantic relatedness between 
prime and target. !Qiat is, whether the prime had been related to the target 
or not, the relative reaction times among the target words remained fairly 
constant. ThiB occurred in spite of an overall decrease in reaction time for' 
all words irtien the prime was, in fact, semantlcally related to the target. 
TtiUBf for Biglish there was a general consistent oommonality of processing 
between lexical decision and naming. 

In contrast, for Serbo-Croatian, the two tasks were not correlated when 
either or both Of the tasks had received semantic priming. Only tAien neither 
task was semantlcally primed did they correlate. It appears that there was a 
processing almilarity between word recognition and naming only when there was 
the least involvement of the internal lexicon. Ttxin suggests that, when the 
Iraical search process in lexical decision received semantic priming, it 
utilised, to a degree, the same kind of informational code as that which the 
pronunciation process used when It received no semantic priming. Presumably, 
this was not a lexical code because semantic priming had no facilitating 
effect on naming. Further, because this pattern of correlaticns occurred for 
Serbo-Croatian and not for English, it is plausible to ascribe the differences 
to their differences in orthographic depth; for Serbo-Croatian, phonetic 
coding is more easily achieved and, therefore, more likely to be used for word 
recognition. 

IDiere is, however, one Result that is superficially inconsistent . with 
this interpretation: semantlcally primed naming did not b%m correlate 
significantly with semantlcally unrelated lexical decision. If semantic 
priming truly had no effect on Serbo-Croatian naming, then both the aemanti- 
eally unrelated and the semantlcally related nailing conditions should have 
behaved similarly and should have correlated significantly with unrelated 
lexical decision. However, this failure is somewhat mitigated by a nonsigni- 
ficant difference between the two correlations. A tentative explanation for 
the smaller correlation may be that (1) semantic priming did occasionally 
stimulate the use of a lexical route to pronunciation, but (2) this route ims 
not more efficient than the other. Ihe occasional use of the alternate 
semantic route could' have been sufficient to weaken the correlation between 
semantlcally related naming and semantlcally unrelated lexical decision. 

I3u • 
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Paeudoiforda 

Tha pseudovord drro? data support the argiment that the use of phonetic 
coding in na»ing is nore prevalent in Serbo^C^atian than in Biglish. The 
ftigllah readers aade many more errors in pronouncing pseudowords i^0% and 12^) 
than did the Serbo-Croatian readers {5%) even thoiigh there was no such 
discrepancy in pronouncing real nozrds^ (irtiere all error rates irere in thd range 
of 2% to If it can be assuaed that pseudonords in both languages w^re 

equally nordlike with regard to spelling pattern (no pseudowords were ortho- 
graphically irregular), then these error data underline the relative difficul- 
ty in pronouncing unfamiliar Ghglish print, whatever thd pronoimciation 
strategies are, whether a strict application of spelling- to- speech correspon- 
dence or a dependence on analog;ie8 to the pronunciations of familiar words. 

^ A second result that we found for pseudowords appears, at first glance, 
to be anom^alous: the effect of semantic relatedness on pseud oword latency in 
all experimental groups ekcept Serbo-Croati&i naming (see Figure 1). However, 
a retrieval strategy effect may account for this result. Obviously, pseudo- 
words could not have been helped by receiving ip;*iming cues that pointed to a 
semantically defined address in mepory — pseudowords have no address in memory. 
Bit, subjects in the semantically related conditions may have depended on 
using the information in the primes to facilitate a memory searc.h for the 
target and, accordingly, may have used this expectation to reduqe 'their 
criterion time for converging on a true lexical -entry; targets not fotSid 
before the criterion limit would be classified aS noiAfords. 

Other investigators have also observed semantic facilitation for peeudo- 
words, under certain conditions. Fosner and Snyder (1975), using a match- 
mismatch paradigm, found that reaction times to mismatched target items ijpre 
faster following a word or letter that did not predict the target than when 
following an asterisk that was equally unpredictive. In two studies, Neely 
(1976, 1977) found that reaction times to pseudowords that followed word 
primes were faster than those that followed a neutral string of X's. Neely' s 
(1977) explanation of these results suggested that subjects adopted a sti^ategy 
of attempting to find common semantic features between the prime and ih'e 
target, an explanation not/ incompatible with our own explanation for the 
results of the present experiment. According to Neely' s approach, su|3jecte in 
our semantically related conditions could have tried (more than other sub- 
jectsX to use the semantic information that was common between prime * and 
target in order to ^decide on the lexical existence of a target item. The 
presence of common semantic features (as for word targets) or the absence of 
common semantic features (as for pseudoword targets) could have speeded the 
time to make appropriate responses. Note that if this explanation is 
accurate, then the presence of semantic facilitation for pseudowords in a 
naming task is additional evidence that the naming process is at least partly 
mediated by the internal lexicon. For th^ present experiments, the pseudoword 
data contribute to the evidence that naming is lexically mediated in Ehgliah 
but not in Serbo-*Croatian. Unfortunately, any detailed explanation for the 
priming effect* on pseudowords must wait for a future experiment; only 
explanations of limited generality can be proffered here. Nevertheless, it is 
an important question to pursue. The appearance of the phenomenon in several 
experiments at'fests to its robustness and its explanation should shed light on 
the process of word recognition. 
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inrAHT IHTEBilpML SI«ECH FERCBFFIQH IS A I£FT HEMISHIERE FUNCTION* 

Kriatine S. Mac Kb in, -i- Michael Studdert-^Kennedyi'i-'i- Susan Spiekeri^ and Daniel 
St0rn+ 



A^bs tract * Prelinguistic infanta recognized structural correspond- 
dencea 'in acoustic and optic proper'6ies of synchronized » naturally 
spoken di8yllableB» but did so only when they were looking to their 
right side.» l!his suggests that intermodal speech perception is 
facilitated by'rightward orientation of attention and subserved by 
the left hemisphere. 

Five- to six-*iQonth-»old infants recognized structural correspondences 
between synchronized acoiaistic and optic displays of naturally spoken disyll- 
ables only lAien they were looking to their right side. This suggests that 
intermodal* perception of speech is a left hemisphere function with a potential 
role to. play in the infant's learning to speak. 

• Besearch on inflants' capacities for intermod^al perception has demonstrate 
ed repeatedly that infants are sensitive to correspond^ces in the acoustic 
^ and optic properties that specify an event (Dodd, 1979; Spelke, 1976, 19791 
Spelke & Cortelyon, 1961). Such studies have two alternative interpretations. 
Infants may prefer a natural pattern of structural correspondence between the ^ ^ 
optic and acoustic dimensions of an event by which, in speech for example, an > 
opening mouth is correlated with a rise in amplitude and with an upward shift 
in overall spectral structure, a closing; mouth with the reverse. 
Alternatively, infants may simply prefer a temporal pattern of correspondence 
by which gross points of change in acoustic and optic structure are synchron- 
ized (Spelke, 1979). If infants prefer mere synchrony, we would expect them 
to be satisfied with any arbitrary pattern of acoustic-optic correspondence: 
Thus, in speech they might have no preference for syllable amplitude peaks 
synchronized with an open mouth over syllable amplitude peaks synchronized 
with a closed mouth. But if infants prefer natural patterns of structural 
correspondence, we would expect them to look longer at the synohronized video 
monitor display of a woman producing artieulatory patterns that specify the 
speech they are hearing than at an alternative, synchronized video display of 
the same woman displaying a different articulatdry pattern. Ve therefore 
investigated infants' capacity to recognize acoustic- optic corriespondence^ in 
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apeech atructure idien the aynchrony between an acoustic and tw competing 
optic diaplaya vaa maintained* 

Our preliminary analyaea suggested that idien acoustic and optic speech 
displays specified the same diayllable, intermodal recognition was enhanced if 
infsmts nere watching the right, father than the left, video display. 
Kinabourne and colleagues (Kinsbourne, 1970, 1974; Lempert 4 Kinsbourne, 1982) 
have show that when adulta look to the right (or left) as they complete a 
task, their perfomance is facilitated if the task dtoands are better 
subserved by the hemisphere contralateral to gaze direction. Such results 
have been interpreted aa evidence that attention, behaviorally manifested by 
gase, may aelectively activate the hemisphere contralateral to direction of 
gase. ¥e therefore expected that upon fuller investigation, only right>«ard 
looking would significantly enhance recognition of acoustic-optic correspon- 
dences in speech structure. 

Eighteen infants, eight males and ten females, 5-6 months of age 
(mean » 3 mon^ha, 25 days) participated in the experiment. We used three 
pairs of naturally produced consonant- vowel -consonant- vowel (CVCV) disyll- 
ables, spoken wtth equal stress on both syllables: /mama, lulu/i /bebi, 
susl/, and /vava, eusu/« We enhanced the opporttanity to detect acoustic-optic 
correspondences by making . the. articulatory dynamics of the contrasting video 
displays highly discriminable. To pibepare the experimental materials, an 
adult female silently articulated each CVCV in synchrony with either the 
correspond inig or the contrasting spoken disyllables of another adult female. 
The voice and the articulating face were recorded simultaneously to appear on 
one aide of a 28 x 22 cm video monitor screen. The video recording procedure 
was then repeated so that the articulating face appeared on the other half of 
the aplit video screen, silently articulating the second CVCV in the pair in 
synchrony with the audio playback of the original disyllable. Deviations in 
acoustic-Optic synchrony were below the adult threshold for detecting asyn- 
chroniea.l The resulting recording of the acoustic signal synchronised with 
two competing articulatory displays was output to two video monitors. 

The infant sat 46 cm from the video monitors on its mother's lap at the 
open end of a wooden box. Ttie infant viewed a different articulatory display 
on the split screen of each ' monitor , one appearing through the right back 
window of the box, the other through the left. The speech corresponding to 
one of the two video displays was played at equal loudness from the speakers 
of both monitors. A camera placed centrally between the monitors filmed the 
infamt* s visual responses* The mother looked over the roof of the box and 
could not see the video displays. 

Infants were presented with each of the three CVCV pairs on four trials 
for a total of 12 trials. Bach member of a CVCV pair occurred twice as an 
aiKiio signal, with its matching video display occurring once on the left video 
monitor and once on the right . The trials were randomised under the 
constraint that no two trials with the same video output immediately followed 
one anotter. Bach trial lasted 20 seconds and consisted of 11 auditory-visual 
CVCV repetitiona. Disyllable durations were about flOO msec, separated by 
interstimulus intervals of about 800 msec. Successive trials began without 
interruption between trials. The experimental session lasted four minutes. 



126 14l' 



Jnfu^t liit«mod«l %eech Ptrceptioti Is a laft ftmiaphere Function 



from video recordings of the child's face, independent observere recorded 
for each trial the duration in seconds of the firat fixation to the right and 
of the first fixation to the left. Ve preferred first fixation over total 
fixation ti»e hecauae it is leaa vulnerable to contanination by facto re such 
aa attentional lapse. Interjudge reliability, based on a Pearson product 
acDent correlation coefficient for 41 randonly selected triale, was r « .96 
for left looking time and r - .98 for right looking time. 

The direction of the infants' first looks after trial onset was to the 
right side on 58^ of the total trials (H - 216). Table 1 presents mean first 
fixation times in seconde for acoustic-optic matches and mismatches on right 
and left sides. TSie means were taken over six disyllables and summed over 18 
infants. It ie evident that the longeet first fixation times are to matches, 
particularly on the right side. 

First fixation times varied acroas infants. Iherefore, we obtained 
proportions of first fixation time spent looking at acoustic- optic matches 
Tccurring on the right and the left side from each infant for each disyllable. 
We thus noraalised for variability over subjects and disyllables and, a J the 
same time, for any general preference for one side over the other. 
Proportions were computed by dividing the first fixation time spent 1 poking at 
a match ( right , left , or both sidee) by the total first fixation time for that 
coaparieon, smmed acroes two trials (see Table 2 for comparisons). 

The overall proportion of total (right and left) first fixation time 
spent looking at matches (mean = .54) rather than mismatches was significant 
(b = 2.64, p <- .004; this and subsequent teets are Wilcoxon matched pairs 
signed ranks tests^ One- tailed) . teble 2 summarizes the remaining results. 

On the right side, the proportion of first fixation time spent looking at 
matches was significantly greater than for mismatches overall (z=2.66, 
p < .004) and for three of the six disyllables: mama , bebi, and _zuzu ^with 
respective values of z = 2.46, p < -007, n = 17, one-tie; z." 1.94, P < -05. 
n -^7, one-tie; t - 2.27, P < -01 )• Proportions were greater than .50 for 
all six disyllables. Cn the left side, the proportion of firet fixation time 
epent looking at matches was not significantly greater than for mismatches 
overall or on any of the six disyllables. Proportions were greater than .50 
for only three of the dieyllables. 

On the right side, the number of infants vho spent more than hal'f of 
their first fixation time looking _at matches versus mismatchee was signifi- 
cant on a' binomial test, for two disyllables ( mama , 13/18, p< .05; zuzu, 
t4/18, p < .02), but no correeponding tests for left-side looking were 
eignificant. 

In a right-left comparison, the proportion of firet fixation time spent 
looking at acoustic-optic matchee wa^ significantly greater on the right side 
than on the left side overall (s - 2.02, p < .02) and for three out of the six 
disyllables: maBa, bebi, and _bubu (respectively, z - 1.87, P < '05! z = 
p < .05; z - lT967 fT".05). Bight side proportions were greater than left 
for all six disyllables C^ble 2). 



127 



InUnt bitsniodal 8p««oh Btrotptlon Is a Lift HniUifliere nmctlon 




Tablo 1 

first fixation Tines in Seconds, Averaged Across Six DLsyllables, to the Left 
and Right Video Usplay Vhen the Usplay Hitched or Mionatched the Audio CVCV. 
Mean fixation TLaes are Sunmed Across 16 Infants. 



Video Usplay 



Urec tion 
of Gaie 



Ifei tc he e 
Audio CVCV 



Miaaatches 
Audio CVCV 



Left 
Right 



66.0 
81 .2 



59.3 
67.0 



Table 2 

Proportlon15T First Hxation Tiine, Averaged Over 18 Infante, Spent Looking at 
Right Ifctches vs. Right Mismatches, Left Matches vs. Left MLamatches and Right 
vs. Left Matches on Six DLsyllables. 



Proportion of time 
spent looking at / 



bebi 



DL syllable 
eusi mama lulu vava 



eusu 



Overall 



Right Matches vs. 
Right HLaaatches 



.59 



52 



.62 



53 



.52 



.61 



57 



Left Matches vs. 

Left Mlsnatches .54 . 50 . 54 . 49 . 49 . 52 . 51 

Right vs. Left Matches .57 .57 .61 .52 .58 .59 .57 



1 ^ 
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Qa« pottntlaX tourca of bias, a prafaraace for aa optic artlculatory 
pattarn IrraapaotLva of tha acouatlc pattarn that apconpanlad It, i&i«;ht hava 
Influancadh thaaa raaulta. lb chack for thia» Spaaraan rank order correlation 
ooaffloimita vara cosputad fior prefarencaa for a video dlaplay when the audio 
alfnal aawbadi tha video diaplay and lAian it did not aatoh the video display. 
Va ooaputad correlationa for rl^ht and left eidea conbined aa well aa for each 
aide aaparataly* A aignificant poaitive correlation vould indicate that 
Inftata praflerred to look at a particular artlculatory pattern irrespective of 
tha CYCV to idiich they vere liatenlng. Hone of the correlations was 
significant* 

c 

In auaaary, influita looked aignificantly longer at synchroniEed video 
diaplaya of a w»an articulating a disyllable synchronised and matched with 
what they were hearing, than at an alterziative display synchronised but not 
natohad with idiat they were hearing. !Iheir preference was therefore fOT 
acoustic- optic correapondences in structure, not for mere synchrony. 
NDreover, they displayed this preference only when attending to* their right 
aid#. 

Ibese findings dnonstrate, first, that infants are sensitive to natural 
structural correspondences rather than merely temporal ones, between the 
acoustic and optic properties of articulation. Second, and more importantt 
they indicate mutual facilitation of two left hemiaphere functions: rightward 
orientation of attention (Kinsbourne, 1970, t974; Lempert & Kinsbourne, 1962) 
and intermodal apeech perception. Taken with the well-known dominance of the 
left hemisphere in the motor control of speech for adults (KLlner, 1974) and 
in speech perception for both adults (Studdert-Kennedy A Shankweiler, 1970) 
and infants (Nolfese, Freeman, A Palermo, 1973; Best, Hofftaan, & Glanville, 
1982), these results suggest that the nomal infant's capacity to begin 
reproducing native la£%uage speech sounds in prelinguistic babbling (de 
Boysson-Bardieat Sagart, A Bacri, 19Bl)f may rest on a predisposition of the 
left hemisphere to recognise sensorimotor connections between the auditory 
structiure of speech and its articolatory source. 
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FOOTNOTE. I i 

^Temporal discrepancies in auiio-video speech events must reach 131 msec 
before tliay can be detected by adults (Dixon a Spite, I960). In our study, 
teaporal dlacrepancies betveen corresponding events on any two video displays 
did not ezcee^ 48 asec. Furthermore, there were no significant differences in 
aeven adulta* perceptual Jidgaenxs of temporal discrepancies between acoustic- 
optic matohes versus miflnatohas for any of the six disyllables. We assume 
that infants' sanaitivity would not be superior to adults' on this task. The 
procedures are detailed in NacKain, Studd^rt-Kennedy, Spieker, and Stern (in 
preaa) . 
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fBRCEFTUAL ASSBSSMEMT OF CQARTICUIATIOH IN SEQUENCES OF TWO STOP CONSONANTS* 
Bruao H* Repp 



Abstract * This study Investigate whether any perceptually useful 
coarMculatory inforaatlon Is carried by the release bursts and 
fo raant transi tions o f tvo sue cessiv e , nonhomorganic sto p c onso- 
DAnts^ Ths VC or CV portions of natural VCCV utterances vere 
replaced vlth matched synthetic stimuli from a continuum spanning 
the three places of stop articulation. When the VC and CV portions 
in the resulting hybrid VCCV stimuli were separated by a fixed 
* silent interval, the context in nhich the natural portion had been 
produced had no influence on listeners* identification ^Df the 
synthetic portion^ suggesting that VC and CV formaiit transitions and 
CV release^ bursts contained no perceptually salient coarticulatory 
cuss. However, when a natural VC portion was separated from a 
synthetic CV portion by the origiiial, closure interval, which includ- 
ed a brief release burst of |he first stop, there was a sizeable 
effect of thet original CV context on the perception of the second 
stop consonant. Thus, the release burst of a syllable- final stop 
contains significant coarticulatory inforaatlon about a following, 
nonhomorganic stop. ThiB i^s confirmed by acoustic analyses of the 
stimuli. Ttie perceptual data also revealed contrast effects between 
two successive stop consonants, which were attributed to the closure 
interval as a cue for a change in place of articulation. 

ft 

IHTRODUCTION 

'I 

It has long been known that the perception and production of stop 
consonants varies with vocalic context (e.g., Doraian, StuddO'rt-Kennedy, & 
Raphael, 1977; Ohman, 1966; Sharf A Ohde, 1981). This is hardly surprising, 
since a stop '* consonant" is essentially Just an abrupt way of stopping, 



^arts of this paper were presented at the 103rd Meeting of the Acoustical 
Society of America in Chicago, April 1982. The portion dealing with contrast 
effects has been revised and is reported as Ecperiment 3 in Bidirectional 
contrast effects in the perception of VC-CV sequences," Perception & 
Paychophyeics , in press. The remainder is to be published in revised form in 
the Journal of the Acoustical Society of America under the title, "Coarticu- 
lation in sequences of two nonhomorganic stop consonants: Perceptual and 
acoustic evidence." 
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starting, or interrupting continuous articulations, such as vowels, 
articulation). Although some acoustic properties of stop consonants are 
roughly invariant across different vocalic contexts (Blunstein 4 Stevens, 
1979; Stevena a HluBstein, 1978), these properties are by no means the only 
perceptual cues (Dorman 4 Raphael, 1990). There is a considerable literature 
on vowel-dependent effects in stop ottsonant perception; these effects gener- 
ally reflect the wy natural speech is patterned in the acoustic and 
articulatory domains (Doraan et al., 1977; Libeman, Delattre, 4 (Jooper, 1952; 
Suuerfield 4 Haggard, 1974)* 

Recent studies have revealed that i^top consonants also interact with 
other coneonantal segments in their vicinity, not only with regard to voicing 
(e.g., Klatt, 1975) but also with regard to place of articulation. This 
evidence has come primarily from perceptual studies, ^us. Repp (1978, 1981) 
has shOTO that the perception of a syllable- initial stop nfay be influenced by 
a preceding, syllable- final stop (and vice versa), Mann (1980) found an 
influence of a preceding, syllable- final liquid, *)and Mann and Repp (I98l) 
found an influence of a preceding fricative: listeners are more likely to 
perceive a syllable aaHbiguous between /da/ and /ga/ as "ga" when it is 
preceded by /d/ , /s/, or /I/ than iihen it is preceded by /g/, /J/, or / r/ . 
The general principle seems to be that an ambiguous stop is more likely to be 
perceived as having a posterior place of articulation* lAien it is preceded by a 
coneonantal segment that has an anterior* place of articulation (relative to 
some other possible context: /d/ vs. /g/, /e/ vs. /J/, /i/ vs. /r/). There 
are several possibly explanations for these findings. 

(1) The perceptual interaction between the precursor and the target 
segment may take place at a purely auditory level of processing: The spectral 
properties of the acoustic segment preceding the atop closure interval may 
prime the auditory system in way that modifies the internal spectral 
representation of the signal onset following the closure, which contains the 
important cues for the perception of stop place of articulation. If such an 
auditory interaction takes place, it is likely to be contfastive: Prominent 
spectral components of the preceding segment would adapt the neurons sensitive 
to these frequencies, so that they respond more weakly to the following 
segment. Indeed, there is evidence from physiological studies in animals that 
such adaptation does take place in the auditory nerve (Delgutte, 1980; Harris 
4 Dallos, 1979). Considering the spectral complexity of the speech stimuli 
used in the various perceptual studies, it is not clear whether auditory 
adaptation of this sort really could account for the contrast , effects 
obtained, but the possibility certainly deserves attention. Ttie present 
research, however, is more directly concerned with a second class of hy- 
potheses. 

(2) The other possibility is that perceptual contrast arises from 
listeners* tendency to maximally differentiate successive phonetic segments on 
the dimension of place of articulation — i.e., that the effect originates in 
phonetic, as distj.nct from general auditory, properties of the stimuli. In 

' this case, it may be either a true perceptual effect or a response bias of 
some so rt . (a) If it is a response bias , its cause may be foimd in 
statistical iwroperties of the language, such as the frequencies of occurrence 
of particular consonant sequences. This argtinent was effectively rejected by 
Mann and Repp (I98l) for one of the cases described ( fricative- stop se- 
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quences). (b) If it^is a true perceptual effect, its cause may be found in 
allophonic variability of stop consonants due to coartlculation with neighbor- 
ing segoients. Since coarticulation is invariably assimilatory in nature, 
listeners* perceptual conpensation for such effects, to the extent that it 
occurs* vould have to resxilt in contrastive effects. It is this possibility — 
the coarticulation hypothesis , for short — that has received the- greatest 
attention in previous studies and that was also the primary concern of the 
present experiment. 

Aridence for the coarticulation of stops idth preceding fricatives has 
been obtained by Repp and Mann (1961, 1962). Ttiey demonstrated that, when the 
fricative noises of natural fticative-stop-vovel utterances are excised to- 
gether with the stop release bursts, and the remaining periodic stimulus 
portions are presented to listeners for identification, the (aomei^at ambigu- 
ous) stop consonants cued by the vocalic formant transitions are more often 
assigned an anterior place of articulation lAien the excised fricative context 
was /s/ than idien it was /]/• Sepp and Mann ( 1.881 ) also found that, lAien the 
fricative noise of a fricative-stop-vowel utterance was replaced with a 
synthetic noise ambiguous between /s/ and / J/, listeners' fricative identifi- 
cation was biased in the direction of the replaced segment. Both findings 
suggest that the foment transitions following the stop closure (and, in the 
later study, < the stop release burst as well) carried coarticulatory informa- 
tion about the preceding fricative* Repp and Mann (1982) subsequently 
conducted acoustic measurements that confirmed an influence of preceding / s/ 
or /J/ on the formant onset frequencies in the following signal portion, 
although the articulate ry interpretation of these effects was not straightfor- 
ward and there ims large variability across different speakers and utterance 
types. Still, the evidence in this case does favor the hypothesis that 
compensation for fricative-stop coarticulation is the basis for the effect of 
a preceding fricative on stop perception. Results reported by Kann (1980) 
suggest that the coarticulation hypothesis may account also for the perceptual 
effect of preceding liquids on stop consonant identification. 

The present study was concerned with the contrastive influence of one 
stop consonant on the perception of another (preceding or following) stop 
consonant. The phenomenon of interest was first reported by Repp 
(1978: Exps. 5 4 6). He preceded synthetic syllables ambiguous between /bl/ 
and /dC/ with either an unambiguous /ab/ or an unambiguous /ad/ and found 
that, when the silent interval separating the two syllables was roughly 
between 100 and 200 msec, listeners tended to report two different stops 
(/abdt/, /adbt/) more often than a single stop (/ab«/, /ad</). A similar 
contrastive effect was found when syllables ambiguous between /ab/ and /ad/ 
were followed by either /bt/ or /d«/. In a subsequent study. Repp (l980a) 
mapped the time course of these effects in considerable detail. He found 
retroactive contrast (the effect of the second stop consonant on perception of 
the first) to be considerably stronger than proactive contrast (the effect of 
the first stop on perception of the second). Retroactive contrast was highly 
dependent on the range of silent intervals employed and seemed to extend to 
intervals beyond 200 mssc; proactive contrast, on the other hand, was not 
affected by range and was absent at intervals beyond 200 msec. No contrast 
was obtained at short intervals of silence (less than 100 msec) where 
listeners tended to report only a single (the second) stop consonant--an 
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Inttrforancs phsno«tnon that has bean studied extensively and idll not concern 
us hers (sfs Doxman,. Baphael, A Libexman, 1979; Repp» 1978, 1980b). 

Lit us consider these findings in the light of the two hypotheses 
outlined above. !Qie possibility of an auditory interaction in the case of t¥0 
stop oonsonants is perhaps inoreased by the fact that the spectral correlatee 
of tbe saae stop in initial and final position are roughly similar, though far 
froa identical (especially not in different vocalic contextsJ Repp, 1978). 
Studies of selective adaptation, a phenomenon similar to contrast, have failed 
to find effsots of VC adaptors on (mirror- image) CV test stimuli (Ades, 1974; 
Savusch, 1977)* In those studies, adaptors and test stimuli were separated by 
ssveral seconds of silence, idiioh may have prevented the interaction studied 
here. However, any auditory explanation' would also have to account for the 
existence of large retroactive effects and for the particular time course and 
ranges dependency of these effects. The foim that such an elaborate auditory 
explanation might take is not clear at present. 

When we consider a phonetic explanation of the perceptual contrast 
between sucoessive stop consonants, we might first ask whether it could be 
seas )dLnd of response bias. Obe relevant consideration is that listeners may 
prefer hearing two different stops because sequences of two identical stops 
(as in /abW) rarely occur in B:iglish. However, this argument applies only 
at rather long silent intervals, where contrast effects are small or absent; 
at intervals between 100 and 200 msec, the choice is generally between hearing 
either two different stops or a sifigle stop. Since single intervocalic stops 
are more ftequent in the language than sequences of two different stops, the 
response bias hyrotheais must be rejected. Nevertheless, it could be that 
listeners adopt al>ias for reasons connected with their interpretation of the 
experimental task; e.g., they might think that their ability to distinguish 
two successive consonants is being tested* Clearly, such a bias cannot be the 
idiole explanation, considering the differences between proactive and retroac- 
tive contrast and their changes over time. However, to examine that possibil- 
ity, Repp (I9e0a; Exp. 2) used an. AXB task in idiich the listeners had to 
discriminate stimuli drawn from a /ba/-/da/ (or /ab/-/ad/) continuum in the 
presence of fixed /ab/ or /ad/ precursors (or /ba/ or'/da/ postcursors) at two 
different silent intervals. Contrast effects were found in all conditions, 
suggesting that these effects are, at least in part, perceptual in nature. 

Turning to the pc^ssible basis of such perceptual effects, we must take 
note of the fact that, in production (of nonsense disyllables, at least), 
sequences of two different stop consonants have much longer closure intervals 
than single * intervocalic stops;* in fact, the ratio of average durations is 
about two to one (Westbury, Hots 1). It so happens that perceptual contrast 
effects occur precisely at those Interv^s that a55L characteristl,c of two-stop 
sequences. Thus, if these interv€il durations signal to the listener that two 
stops have occurred rather than one, contrast effects*" would be a natural 
result: Listeners would automatically adjust their phonetic interpretation of 
an ambiguous stimulus portion so as to yield a place of articulation different 
from that conveyed by the less ambiguous portion. Effects of interval range 
on the magnitude of contrast may then be attributed to perceived changes in 
speaking rate, and the bidirectional ity and "time course" of the contrast 
effects are readily predicted. Ihe finding that retroactive contrast is 
larger than proactive contrast requires an additional assumption: Perhaps, 
listenera delay phonetic decisions until the cues for both stop consonants 
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have batn proo^satd, and the fact that the cues for the first stop must be 
held longer in auditory aenory aakes them more vulnerable to contextual 
Influences. 

There is a third alternative hypothesis to considerp which is encouraged 
by the findings on fricative-stop and liquld«»stop sequences (Mannp 1980; Mann 
A Reppi 1961; Repp A Nannt 1981). It is the possibility that the perceptual 
contrast effects^ derive from listeners* compensation for a coarticulatory 
dependency between two successive stop consonants. If it were the case that 
the place of articulation of a stop shif^ slightly toward that of a proceeding 
or following stop, as it seeias to do in the case of a preceding fricative or 
liquid, then a coarticulatory basis would exist for perceptual contrast. The 
difference between proactive ^udd retroactive contrast may then correspond to a 
difference in the extent of forward \>r backward coarticulation, and the 
decline of the perceptual effects over time may parallel a decline in the 
extent of coarticulatory shifts as the closure interved is lengthened. 

Quite apart from the question of whether coarticulation in two-stop- 
sequences is the cause of perceptual contrast effects, lAiich would be 
difficult to prove directly, we must ask whether such coarticulation exists at 
all. If evidence of coarticulation were found; the hypothesis that relates it 
to perception could be maintained; however, if no coarticulatory effects were 
found, the hypothesis would be eliminated (barring the possibility that 
coarticulatory variation was really present but not detected because, e.g., 
the methods of assessment were not sufficiently sensitive). The present study 
investigated coarticulation using an indirect, perceptual method that was used 
with some success by Kann (1980) and by Repp and Hann (1981 ). The basic 
technique is to replace a portion of a natural utterance with a matched 
synthetic segment that, ho^wever, is phonetically ambiguous, and to see whether 
listeners tend to interpret the ambiguous segment as matching the replaced 
segment. If so, it may be asaximed that coarticulatory cue^ in the remaining 



In the course of its siearch for coarticulatory variation, the present 
study further investigated one aspect peculiar to two-stop sequences: The 
closure period separating the two vocc^lic segments often contains a noise 
burst generated by the articulatory release of the first stop. This release 
burst, idfiich occur? roughly in the middle of the closure interval, tends to be 
shorter and j>t lower amplitude than the release bursts of utterance- final 
stops (Abercrombie, 1967; Henderson A Repp, 1982; Repp, ]980b). Nevertheless, 
it seems possible that these brief release bursts do carry some perceptual 
information, either in their spectral inroperties or in their timing within the 
two-stop closure. Since the burst derives from the release of the first stop, 
it obviously contains softit^ inforaation specific to that stop's place of 
arti6ulation-«the question could only be bow important that infomation is to 
a listener. The more interesting possibi3|lty studied here is that the burst 



might also contain infomation about tfle following stop consonant, lAiose 
closure is established at (or slightly before or after) the time at which the 
cl(taure of the first stop is released. Therefore, the present experiment 
included a condition in lAiich an ambiguous synthetic CV portion was preceded 
by a natural VC portion (taken from a VCCV utterance) that included a final 
release burst; this condition was compared to one in which the release burst 
was replaced by silence. 
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Bssides probing for cosrticulstory variation, the present study also 
investigated farther the generality and nature of the perceptual interactions 
between t» successive stop consonants.- For this purpose, it used »ll^*hree 
places of articulation; thus, for example, a stop aabiguous betnoen /b/ and 
HJ W8 preceded not only by an unambiguous /b/ or /d/ but also by a I e/ . ir 

rsrceptual contrast effects operate solely among naibers of the saae category 
e.g.; hearing the fi^st btop as /b/ reduces the probability of hearing the 
secW stop as /b/), then a /g/ precursor should have little effect on a stop 
aabiguous betneen /b/ and /d/, and the results should match those of a control 
condition in which the aabiguous second half of the stiaulus is presented in 
laolation. On the other hand, Repp (1980b) observed curious and rather 
coBplex perceptual interactions between all three^p categories at somewhat 
shorter closure intervals than those used here, and it was to be setn whether 
those 'findings could be replicated. ^. ' 

An important consideration that may bear on the generality of contrast 
effects is the choice of response alternatives for the subjects. Repp (1978, 
igeOaj Bip. lb) gave his subjects the fshoice of writing down two different 
stops or a single stop. However, since closure intervals between 100-200 msec 
tend to be too long for single stops, the menu of alternatives may have been 
partially responsible for the contrast effects observed. ^ the ^mo* 
study, the subjects always wrote down two responses, one for the first one 
for the second stop, and they were told that the two consonants could be 
either different or the same. Although a preference for reporting two 
different stops may still be predicted on the 6 ^.fr'^S?^^ 
too short for gaiinate stop consonants (PicHe.tt A Decker, I960j Hepp, 1978), 
thJ "Subjicts knew that the stimuli contained VC and CV Portions that «»ey had 
iSeviously heard in isolation and that simply had to be identified in close 
succesi^ion. Short of probing for a single stop at a time, this i» Pr°J*^ly 
cloni as one could get to instructions that «ore not biased in the direction 
of contrast. 

METHOD I 



Subjects 

A total of twelve subjects participated. Pour of them— two paid atudent 
volunteers, the author, and a graduate research "sistant-liatened to both 
eets of tapes (described below). Each set was presented to four •dditional 
ftiSent vXteers who listened to one set only. All volunteers ^^^^ ^^^^l 
speakers of American English. The author and the research assistant are 
native speakers of Austrian Geman and mdwestern Scots Qiglish, respectively. 

The author (BR) and a linguist colleagJ (GC) , a native speaker of 
American Biglish, produced the. original sets of utterances. It was co;;J««»*«^ 
Slikely thit th^ Author's native German would render either his production or 
his perception different from those of the other participants, since the study 
was concerned with phonetic distinctions that are similar, in Ehglish and 
SmeST. SSver, to^ forestall any possible objections to the author as a 
speaker, two parallel sets^of stimuli' were used. 
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Stimuli 

Hatiiral utterances * Speakeirs OC and BR each recorded a set of nonsense 
utterances idiicb included five tokens each of /abda/, /abga/, /adba/ , /adga/ , 
/agba/, /agda/ Cas well as /aba/, /ada/, /aga/, idiich nere not used in the 
perceptual experiraent) • The utterances were produced with stress on the first 
syllable! so as tp prevent reduction of the first vowel. The speakers read at 
a steady speed ft*oa a rand ooiised list into a Sennheiser HKH 413T microphone 
lAxose response was recorded by a Crown 822 tape recorder. 

A representative token of a VCCV utterance (/adga/' produced by GC) is 
shown in Figure 1 in the form of an oscillographic trace. It contains three 
major acoustic segments: , The segment from onset to the beginning of the 
closure (the VC portion ); the closure period ; and the segment from the release 
of the closure to the end (the CV portion ) . Roughly in the middle of the 
closure period, there is a brief noise burst deriving from the articulate ry 
release of the first stop consonant (the VC release * burst ) . Although this 
burst is sometimes absent in fluent speech (Henderson & Repp, 1982), it is 
generally found in isolated utterances of the present kind (Repp, 1980b). All 
but two of BR*s and all but one of GC*s VCCV tokeJks contained VC release 
bursts. The average durations of the three major segments (VC, closure, CV) 
were 122^ 152, and 299 msec for GC and 165, 150, and 240 msec for BR. The 
average durations of the VC release bursts were 22 and 21 msec, respectively. 
(See the Appendix for a more detailed acoustic analysis-) 

All utterances were digitized at 10 kHz -using the Haskiirs Laboratories 
pulse code modulation system. Each utterance was divided into its three major 
segments, which were stored in separate computer files. 

Synthetic stimuli . Eight continua of synthetic syllables were generated, 
four for each speaker. They ranged, respectively, from /ab/ to /ad/, from 
/ad/ to /ag/, from /ba/ to /da/t and from /da/ to /ga/. To match the endpoint 
stimuli as closely as possible to the corresponding segments of natural 
utterances, good-sounding natural tokens of the relevant segments were select- 
ed from the recorded VCV utterances and analyzed with the aid of a Federal' 
Scientific UA-6A spectrum analyzer. The resulting computer spectrograms were 
displayed on an oscilloscope, and the three lowest foimants were tracked by an 
automatic peak- picking procedure. The foimant tracks were then traced with a 
light~pen idiose output was automatically converted into frequency parameters 
for the OVE IIIc serial- resonance synthesizer. In this way, synthetic copies 
of /ab/, /ad/, /ag/, /ba/, /da/, and /ga/ were obtained' for both GC and BR. 

Within each set of VC "or CV utterances, all stirfuli were assigned the 
same fundamental frequency contour, amplitude contour, and duration. The 
first- foraant frequencies were also equalized at some compromise values, as 
were the steady-state vocalic * portions. Thus, the stimuli differed only in 
the transitions of the second and/or third foimant. 

Schematic representations of these stimuli in terms of connected synthe- 
sizer parameter values are provided in Figures 2 and 3» Although their 
durations were not exactly matched to the average durations of t6e correspond- 
ing portions of each speakerVs VCCV utterances (rather, they represent the 
durations of the particular VCV tokens copied), they do reflect the fact that 
GC generally put relatively less stress on the first syllq^^® than did BR. 




Figure 2. Fbivant frequencies (connected synthesis parameters) of the syll- 
ables that served as the end points of the synthetic continue (GC 
set) . 
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figure 3. Itoraant frequencies (connected synthesis parameters) of the syll- 
ables that served as the end points of the synthetic continua (BR 
set) • 
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BLfferenoee betveen the tvo epeakere in foment tranaltlone vlll also be 
noted, as veil as the fact that speaker OC produced much higher second 
ftomants than BR. ftuidavental frequency changed linearly vithin stimuli as 
folloim: 99-91 Hs (VC) and 90-120 Hb (CV).^for the Gfi i^timuli; and 132-124 Hs 
(VC) and 100-58 Hs (CV) for the BH stimuli. These values again reveal 
differences betifeen the two speakers^ in relative stress assignment and 
intonstion, vhich thus were preserved ln,the Jiynt)ietic copies. ^ 

Seven-maber synthetic continue from /ab/ to /ad/, /ad/ to /ag/, /ba/ to 
/da/t and /da/ to /ga/ for each speaker vers produced by linear interpolation 
between the foment tracks of the tvo respective end point stimuli, in roughly 
equal steps. All stimuli vere digitised at 10 kHz. 

Bxperimental conditions . Two parallel sets of tapes were recorded, one 
using the OC stimuli and one using the BR stimuli. Within each set, there 
were three subsets of tapes corresponding to three separate experimental 
sessions. They ^ will be temed, respectively, the Backward, Forward, and 
Ibrward-With-Release conditions. 

The Backward condition investigated the influence of natural CV portions 
on the perception of sj^athetic VC portions. It included 5 tapes with riindom 
sequences of the following: 

(1) The 7 stimuli from the synthetic /ab/-/ad/ continuum, repeated 10 
times. 

(2) The 7 stimuli from the synthetic /ad/-/ag/ continuum, repeated 10 
times. 

(3) The 30 natural CV portions (3 syllables, each from 2 different VC 
contexts, 5 tokens of each), repeated 5 times. 

(4) The synthetic /ab/-/ad/ stimuli followed by the natural CV portions 
after a fixed silent interval, a total of 7 X 30 = 210 combinations. 

(5) Aa in (4), with the synthetic /ad/-/ag/ stimuli. 

The Forward condition investigated the influence .of natural VC portions 
on the perception of synthetic CV portions. It included five tapes analogous 
those in the B&ckwardi condition: 

(1) The 7 stimuli from the synthetic /ba/-/da/ continuum, repeated 10 
times. 

(2) The 7 stimuli from the synthetic /da/-/ga/ continuum, repeated 10 
times. 

(3) The 30 natural VC portions (3 syllables, each from 2 differentyVC 
contexts, 5 tokens of each), repeated 3 times. These stimuli old 
not include the release bursts of the syllable- final stop consonant. 

(4) The natural VC portions followed by the synthetic /ba/-/da/ stimuli 
after a fixed silent interval, a total of 7 X 30 " 210 combinations. 

(5) As in (4), with the synthetic /da/-/ga/ stimuli. 

The F onrard- Wi tb-R e lease condition assessed the perceptual contribution 
of the VC release that was embedded in the closure period of the original 
utterances. This condition included three tapes similar to tapes (4)f (5)f 
and (3) of the forward condition: 
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(1) Ths 50 natural VC portions follomd by the original closure period 
(lAiich includad tha VC ralease burst) and by the synthetic /ba/«-/da/ 
stiBuli, a total o/ 210 stiaafi. 

(2) Ab in (l), with tha aynthatic /da-Zga/ stimuli. 

(3) Tha 30 natural VC portions follomd by the original closure period, 
rapaatad 5 tisaa* 

In tha fbmard and Backward conditions, the silent interval separating 
tha VC and CV portions on tapas (4) and (3) v&s 130 msec on the OC tapes and 
150 msac on the BR tapaa. Thasa valuea matched the average VCCV cloaure 
durations of tha two apaakera. Tha interstlmulus intervals were 2*5 sec on 
tha tapaa containing single VC or CV syllables, and 3 sec on those containing 
VC-CV combinations, with longer intervals from time to time. An exception ^s 
tape (3) of tha Ibrward-With-Release condition, where the interstlmulus 
intervals ware increased to 4 aac. 

Procedure 

The three conditions were administered in three different sessions on 
different days. The order of the Forward and Backward conditions was varied 
acroas subjects; the Itorward-With-Re lease condition was always last. ¥ithin 
each conditiont the tapea were presented in the order listed, except that the 
sequence of tapes differing only in the nature of the synthetic stimuli (/b-d/ 
vs. /A^e/) waa varied across subjects. 

Vhan listening to tapes containing isolated VC or CV syllables, the 
subjects' taak was to identify the stop consonants as "b," "d," or "g." All 
three alternatives were given, even lAien the stimuli were intended to cover 
only two categories. Tape 3 of the Forward-With-Re lease condition (VC 
portions only) waa an exception. Heror the subjects were instructed to 
identify the syllable- final stop jEmd the stop that might have followed it in 
the original VCCV utterance, guessing if necessary, with the restriction that 
the two stops always be different from each other. Thus, subjects chose from 
six response alternatives here^C'bd," "bg," "db," "dg," "gb," and "gd"). When 
liatening to tapes containing VC«^V combinations, the subjects chose from nine 
alternatives: "bb," "bd," "bg." "db," "dd," "dg," "gb," "gd," and "gg." All 
nine responses were pemitted even though only six were intended to be 
relevant to a given tape. The subjects were told that the stimuli consisted 
of the VC and CV components they had heard before, that the stop consonants in 
both components were to be identified, and that these consonants could be 
either the aame or different. Single- consonant r&sponses ("b," "d," "g") were 
not pemitted and certainly not appropriate under these instructions. 

The stimulus tapes were played back on an Ampex A6300 tape recorder, and 
the subjects listened over TDH-39 earphones in a quiet room. 

RESULTS 

" ft 

Identification of Hatural-Speech Stimuli 

This part of the data is worth examining not only to ascertain that the 
natural- speech stimuli were generally identified correctly, but also to check 
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ihsthsr tlio psttsrn of srrors— to ths extsnt that any errors occurred-- 
renrealed anything about ooartlculatory variation in these stlaull. 

^CV portions * The natural CV portions vere presented three tlaes: once 
In isolation (9 repetitions) » and twice preceded by synthetic VC portions <7 
repetitions each tlae). 3Lnoe there vere 5 different tokens of each utter- 
ance, a total of 5 X 19 • 9? respontoa vas obtained from each subject to each 
of the six basic syllabless /(sd)ba/, /(ag)ba/, /(ab)da/. /(ag)da/, /Cab)ga/, 
and /(ad)ga/. (The portion in parentheses Indicates the original context.) 

The CV portions vere expected to be very accuirately Identified* since the 
place<*of-artlc'ulatlon Infomatlon should not have been much reduced by remov- 
ing the preceding signal portion. Hovever, this expectation vas confirmed 
only for the ER stimuli (99*3 percent correct)* not for the GC stimuli (90.3 
percent correct). Thus, vhlle the BR stimuli were more satisfactory In terns 
of Intelligibility, the GC stimuli yielded errors that may contain some 
interesting Information. Thet folloving analysis considered the GC stimuli 
only. 

A first comparison shoved CV Identification to be less accurate In 
Isolation (86.0 percent correct) than lAen preceded by a synthetic VC portion 
(92.4 percent correct) — ^a significant difference, P(l,7) -16. 5, ^< .01. 
Although this effect vas confounded vith a possible improvement due to 
practice, It seems likely that It represented a true improvement of CV 
Identification In VC context. Hovever, since the error pattern across 
different CV tokens vas the same regardless of context, the data vere pooled 
for the folloving analysis. 

^ A confusion matrix for the six Individual CV stimuli (all five tokens 
combined) Is shovn In Table 1. It Is evident that /ba/ vas less accurately 
Identified than /da/ and /ga/, vith most of .the errors deriving from those 
tokens of /ba/ that had been preceded by /ad/ In the original utterance. Note 
that these errors consisted of (incorrect) "d*' responses to /ba/; thus, they 
matched the original context (/ad/). A similar, though smaller, difference 
can be seen In the errors for /da/ stimuli: "g" responses vere more frequent 
vhen the original context had been /ag/ than vhen It had been / ab/ . While 
this difference vas not exhibited by all subjects, the difference In /ba/ 
Identification vas significant, P(l,7) -14.7, ^< -01. Thus, here la an 
Indication of a coartlculatory Influence of a preceding stop on speaker GC's 
production of /ba/. 

We may ask idiether CV Identification vas In any way lnfluen9ed by the 
nature of a preceding synthetic VC portion. Inspection of the data revealed 
that the number of (incorrect) "d" responses to /(ad)ba/ Increased more than 
tvofold as the synthetic VC precursors changed from /ab/ to /ad/; hovever, the 
error probability vas about the same for preceding /ad/ and /ag/. The meaning 
of this pattern Is not clear; It does not represent a contrast effect. 
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Table 1 

Confusion Matrix for CV StlDull (QC Set) 



StlMuluB Eeeponee (percent) 





"b" 


"d" 


"g" 


(•d)ba 


69 


30 


1 


(ag)l« 


93 


7 


0 


(ab)da 


0 


98 


2 


(>«)da 


1 


95 


6 


(ab)g« . 


0 


6 


9* 


(ad)ga 


0 


6 


94 



VC portions . The natural VC portions nere presented three times nithout 
VC release bursts and three tines with VC release bursts. In each casst the 
stimuli occiirred once in Isolation (5 repetitions) and tuice followed by 
synthetic CV portions (7 repetitions each time) . Since there were 5 different 
tokens of each utterance* a total of 5 X 19 - 95 responses was obtained from 
eaph subject to each of the two versions of the six basic syllables: 
/ab(da)/, /ab(ga)/, /ad(ba)/, /ad(ga)/, /agCba)/, and /ag(da)/. 

Since unreleased syllable- final stops are gejaerally not easy to Identlfyt 
subjects' labeling of VC stimuli without release bursts was not expected to be 
perfect^ Overall, GC's VC tokens were correctly identified on 86.4 percent of 
the trials; BR*s tokenst on 93.8 percent. As expected, GC's stops wese more 
accurately "identified when the VC release buret was Included (92.2 percent 
correct) than yihen it was missing (80.6 percent correct); however, there was 
no difference for BR's stops (94.0 vs. 93.6 percent correct). In contrast to 
CV syllables, tdentlflcatlon of VC syllables did not improve in the context of 
-an added synthetic stimulus portion. Per (XJ's tokens, the percentages were 
87.7 In* isolation and 85-7 In context; for BR's tokens, the corresponding 
percentages were 93*4 end 94 -O. 

Confusion matrices are shown 'in Table 2. Effects of original context 
were ai^all but generally in the expected direction. !Ihus, for example, GC's 
tokens of /ad(b|)/ without a, release burst received more '^b'* responses but 
fewer "g" responses than /ad(ga)/. Because of the uneven distribution of 
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Tablo 2 

Conftaalon Nttricos for VC Stiaull (6C and BR Seta) , 
IKLthout and With Release Bursts 

Besponoe (percent) 
Stlaulne Without release burst With ^release burst 



GC 


D 


a 


•f II 

g 


II ^ti 

D 


d 


fi tl 
g 


ab( da) 


95 


3 


2 


93 


6 


1 


ab( ga) 


95 


2 


3 


95 


3 


2 


ad(ba} 


1 0 


81 


9 


2 


91 


7 


ad(ga) 


8 


78 


14 


N 0 


94 


6 


ag(ba) 


14 


22 


64 


t 


4 


95 


ag(da) 


8 


21 


71 


1 


14 


85. 


ra 














ab( da) 


99 


1 


. 0 


93 


7 


0 


ab( ga) 


99 


1 


0 


90 


7 


3 


ad(ba) 


7 


88 


5 


2 


98 


0 


ad(ga) 


5 


92 


5 


3 


93 


"4 


ag(ba) 


3 


5 


92 


1 


2 


97 


ag(da) 


2 


8 


90 


1 


6 


93 
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•rrorst no statlstloal analysis ws conduotsd on thess data; while thera may 
be sose suggestions of ooarticulatory influances of a fojlowing stop on VC 
produotion* no strong evidence for such effects can be seen. 

Identification of second (syllable- initial) " stop fron VC pottion plus 
release burst. In the subconditlon where the natural VC portions were 
presented in isolation and inoluded the release burst of the syllable- final 
stopt the subjects were asked to identify also the following! syllable-initial 
stop (guessing if necessary). Their success in doing so was assessed by 
considering only those trials on whioh^the first stop was identified correct- 
lyt for the subjects had been told that the second stop was always different 
trm the first. About 92 percent of the trials met that requirement. 



Table 3 



Identification of Second Stop, Given Correct Identification 
of llrst Stopt frcm Isolated VC "Portions Including Itelease Birst 







QC 




• 




BR 






St la til ua 






Response (percent) 












"b" 


"d" 


"g" 


CJorrect 


"b" 


"d" 


"g" 


Corre 


ab( da) 




81 


19 






80 


20 


63 








69 








•b( ga) 




43 


57 






54 


46 




ad(ba) 


69 




31 




65 




35 


73 








79 








ad(ga) 


11 




89 




19 




81 




ag(ba) 


80 


20 






70 


30 




81 








78 








ag( da) 


24 


76 






9 


. 91 






Hean 








75 








72 



Table 3 shows these conditional response percentages, as well as percent 
correct scores (50 percent correct is chance level). It is evident that 
perfomance was much better tha« chance for both sets of stimuli^ as a whole, 
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and fbr MCh pisos of articulation of tbs first stop, although scores were 
significantly lonsr fop labial than fdr ^ alveolar or velar stops 
[f(2,t4) -6.3, X< -05. m the GC set; r(2,14) • 5-7, J< .05, In the BR 
sst]« Clearly, the stiaull contained infomation about the place of articula- 
tion of the second stop consonant. Ihls infomation vas almost certainly 
conveyed by the VC releass burst, despite Its short duration. Uifortunately, 
the present study did not include a condition in uhlch subjects were asked to 
Identify the second stop fron VC portions without release bursts. However, in 
the author's opinion, perfomance in such a task would hardly have exceeded 
chance. A oOBparison of the results of the Forward and Fbrward-Wlth-Release 
VC-CV conditions, to be described below, confirmed that the VC release burst, 
and not the pre-*closure fomant transitions, contained the significant coarti- 
culatory infomstion. 

The Backward VC-CV Condition 

In this condition, synthetic VC stimuli were follow4d by natural CV 
portions.' The results will be described in two stages: First, coarticulatory 
effects (i.e., effects of the original VC context, which was displaced by the 
synthetic VC syllables) will be discussed, averaging over all stimuli on a 
synthetic continuuiB. Then, other perceptual interactions between the two 
signal portions will be examined in terns of labeling functions for the 
synthetic continua, averaging over original VC contexts. 

One subjects* responses to the BR stimuli were excluded because (although 
he was able to distinguish /ab/ from /ad/ and /dd/ from /ag/ in Isolation) he 
labeled all stimuli from the /ab/-/ad/ continuum as "b" and all stimuli from 
the /ad/-/ag/ continuum as "d" irtien they were followed by natural CV portions. 

Coarticulatory effects . The response percentages are ^hown in Table 4, 
averaged over all members of each synthetic continuum. Coarticulatory effects 
would be apparent, for example, if more "d" responses and fewer "g" responses 
had been obtained to VC stimuli followed by /(ad)ba/ than to those followed by 
/(ag)ba/. However, it is evident from the table that such effects were 
generally absent. The largest difference obtained (9 percent more "g" 
responses when QC*s /ad/-/ag/ stimuli were followed by /(ab)da/ than when they 
were followed by /(ag)da/) was not in accord with the predictions* All other 
differences, idiether in the expected direction or not, were extremsly small. 
Thus, if there was any coarticulatory inforaation in the CV portion, the 
subjects did not make any use of it* 

Other perceptual interactions * That VC perception was not comipletely 
independent of CV context is already evident from Table 4. First, even though 
the synthetic VC stimuli, when presented in Isolation, were classified only 
into the two relevant categories (except for 2 percent "b" respouMs to GC's 
/ad/-/ag/ continuumK responses in the third category did occur in CrV context. 
In part, these responses may have reflected Just general uncertainty and 
occasional order reversals of the two responses on a trial. In part, they 
jwere probably due to a genuine change in the perception of the VC portion. 
Second, it can be seen that the first stop was less often classified into a 
given category idien that category was appropriate also for the second atop. 
In other words, the subjects tended to avoid the "bb," dd, and gg 
responses, and instead favored responses of two different consonants* This is 
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/ Table 4 f 

Backward Condition: Identification of St^thetic VC 
Syllables in Different Katural-CV Contexts 



/ab/./ad/ Jb^/-/b%/ 
continuum continuum 

CV context Response to VC PDrtion (percent) 





"b" 


"d" 


"g" 


"b" 


"d" 


II ^11 

g 


(•d)ba 


61 


27 


12 


4 


26 


70 


(ag)ba 


55 


27 


18 


3 


26 


71 


(ab)da 


75 


16 


9 


8 


29 


63 




75 


16 


9 


8 


38 


54 


(ab)ga 


69 


17 


14 


5 


25 


70 


(ad)ga 


68 


19 


13 


5 


29 


66 
















(ad)ba 


48 


42 


10 


5 


54 


41 


(«g)ba 


47 


46 


7 


2 


59 


39 


(ab)da 


64 


31 


5 


24 


42 


34 


(ag)da 


59 


35 


6 


21 


44 


35 


(ab)ga 


54' 


40 


6 


17 


56 


27 


(•d)ga 


53 


43 


4 


15 


56 


29 
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tlM cipsotsd rstroaotlvs contrast, offset* It is dsplcted in aors detail in 
ligurs 49 itfisrs tbs bassllns idsntiflcation functions for isolated VC stiiiiull 
srs also sbom* fbr irsasons of clarity^ only responses in the **d*'^ category 
have been pLottsd, vhich are relevant on both the /ab/«/ad/ and /ad/-/ag/ 
oontinua* 

It is clear frcs the figure that the aajor Influsnce of the folloving CV 
portion was a general increase in response uncertainty, as reflected in the 
shalloner slopes of the VC-CV labeling functions relative to the baseline^ 
provided by the labeling functions t(ft isolated VC stimuli. In the GC /ab/- 
/ad/ set there nas also a Barked reduction in **d'' responses to the VC^ portion, 
^(ifT) "24.6, 2^ .001, vhich vas primarily ^due to an increase in "g" 
responses: VC stinuxl unambiguously identified as *'d" in^lsolation received 
30-40 percent "g*" responses in CV context. 

Ttko extent of retroactive contrast may be assessed by comparing in each 
panel of Figure 4 the tvo labeling functions for which the CV context 
represented the stop categories that constituted the endpoints of the VC 
continuum. Ibus, more '*d" responses (fever "b" responses) were obtained on 
each /ab/«/ad/ continuum when the following stimulus portion was /ba/ than 
idien it was /da/: P(l,7) • 8.6, 2 < '^St for "b" responses, nonsignificant 
for -d- responses because of "g" intrusions; £(l,7) - 11.8, ^< .05, for "d" 
responses to the BR set. Similarly, more "d" responses were obtained on the 
BB /ad/-/ag/ continuum in the context of /ga/ than in the context of /da/, 
P(l,6) •7.2, ^< .05. However, a nonsignificant difference in the opposite 
dTirection was present on the GC /ad/-/ag/ continuum. Ihus, in three out of 
four conditions there were perceptual contrast effects of the CV portion on VC 
perception, l^ut in one condition such effects were absent. The reason for 
this difference is not clear. 

Finally, we may examine the labeling function obtained when the CV 
context represented the category extraneous to the VC continuum. In the case 
of the /ab/-/ad/ continua, the /ga/ context had an effect similar to "the /da/ 
context for GC stimuli, but similar to the /ba/ context for BR stimuli. In 
the case of the /ad/-/ag/ continua, the /ba/ context was more similar to the 
/ga/ context than to the /da/ context in both stimulus sets, but the match was 
not close for the ER stimuli. Statistical teats comparing the average results 
^ for, the two "relevant" contexts with those for the "neutral" context yielded 
no significant differences. 

The Forward VC>CV Condition 



In this condition, synthetic CV stimuli were preceded by natural VC 
portions that did not include any release bursts. Again, the data were first 
examined to see idiether any coarticulatory effects 'were present, and, subse* 
quently, whether there were any other perceptual interactions between the two 
signal portions. 

Coarticulatory effects . Response percentages pooled across the members 
^of.each synthetic stimulus continuioB are shoim in Table 5* As in Table 4, 
there is no evidence of any influence of the loriginal CV context on the 
responses to the synthetic CV portions. Thus, \we pust again conclude that 
coarticulatory cues were either not present in t^p VC fotmant transitions or 
were not registered by the listeners. 





Figure 4< 



VC STIMULUS NUMBER 

Baclnrard condition: labe^-ing functions. 
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Table 5 

Forwrd Ctonditlon: Identification of Sjyntlietic ^CV 
Syllablea in Different Hatural-VC Contexta 



/ba/^/da/ ,^ /da/-/ga/ 
continuum continuum 

VC context Reaponae to CV Portion (percent) 



GC 


II vti 
D 


d 


II II 

g < 


llv II 

D 


II ^11 

a 


II II 


Bx>\ da; 


56 


43 


1 


0 


61 


39 


aD^ ga; 


57 


4t 


1 


4 

1 




37 


ad(ba) 


65 


34 


1 


1 


67 


32 


ad(ga) 


64 


34 


2 


3 


65 


32 


ag(ba) 


57 


42 


1 


1 


72 


27 


ag(da) 


53 


44 


3 . 


1 


67 


32 
















ab( da) 


53 


46 




14 


41 


45 


ab( ga) 


54 


45 




6 


45 


49 


ad(ba) 


74 


25 


9 


28 


30 


42 


ad(ga) 


72 


27 




29 


33 


38 


ag(ba) 


51 


48 




9 


65 


26 


ag(dajt 


52 


47 




8 


62 


. 30 
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Othar percaptual intaractiona . Tabla 5 ahons that, ifith the exception Q^f 
BR*a /da/-/ga/ contlnuim, idiich received a substantial number of "b" 
raaponaea, reaponaaa in the "third category" were very infrequent. Thus, 
ajnthatic CV atiauli in VC context aeemed to be more stable perceptually than 
aynthatic VC atimuli in CV context. However, Table 5 also shows clear 
evidence of an infl\«nce of the VC portions on CV perception, which is shown 
graphically in Fitgure 5« 

Conaxdaring first the /ba/-/da/ continua, we see a contrast effect, 
particularly in the BH stimuli: The synthetic CV stimuli received fewer "d" 
reaponaea idien preceded by /ad/ than idien preceded by /ab/, £(1,7) 7.0, 
jg < .05, for the GC set; P(l,7) « 12.5, ^ < -01, for the BR set. In both the 
GC and BR seta, the "neutral" /ag/ precureor had the aame effect as /ab/; this 
was reflected in significant differences between the combined /ab/ and /ad/ 
precursor results and the /ag/ precursor results, F(l,7) - 6.2, 2 ^ '^5, for 
the OC set; P^(l,7) *»'20.9, jg < .01, for the BR set. The two stimulus sets 
differed from each other in that VC precursors consistently reduced the rate 
^f "d" responses relative to the isolated-CV basl^line in the GC set, 
F(t,7) - 126.3, 2 ^ the BR set. 

The results for the /da/-/ga/ continua were more variable, iln the GC 
aet, the proportion of "d" responses was only slightly lower in /ad/ context 
than in /ag/ context (a nonaignificant contrast effect), and in both those 
contexts there were more "d" responses than in tlfe neutral /ab/ context, 
F(lf7) « 11.6, 2 ^ •05» ®^ isolation. In the BR stimulus set, on the other 
hand, there was a very large difference between the effects of /ad/ and /ag/ 
precursors— *a pronounced proactive contrast effect, JP(1,7) " 64*5, < -001. 
The labeling function for /ab/ precursors fell between these two extremes, 
somewhat below that far isolated CV syllables. Closer examination of these 
data revealed that the decreases in "d" responses with /ab/ and especially 
with /ad/ precursors wjp re primarily due to an increase in "b" (rather than 
"g") responses (cf. Table 5). When considered in terms of "g" responses, the 
contrastive effect of /ad/ vs. /ag/ precursors was much smaller than the 
differences shown in figure 5 for "d" responses, although it was still 
significant, P(l,7) - 6.3, 2 •^5. The "b" intrusions occurred even though 
not a single "b" response was given to the synthetic /da/-/ga/ stimuli in 
isolation. The reason for their occurrence in context presumably lay in the 
spectral structure of the stimuli (cf. Figure 3): The synthetic /da/ and /ba/ 
were not very different in the BR set, certainly much less s6 than in the GC 
Bet. 

Finally, we may ask irtiether the synthetic CV portions had any influence 
on the way the preceding natural VC portions were labeled. There was more 
\ information in the data here tjian^in the Backward condition, because natural 
VC portiona were less accurately labeled than natur€LL CV portions. Changes in 
VC error percentages as a function of CV stimulus number are shown in Figure 
6. It can be seen thai^ with three striking exceptions, there was not °^J^ch 
change. The exceptions are, in the GC set, a dramatic increase in "b" 
responses to both /ad/ and /ag/ idien they were followed by the most /da/-like 
stimuli from the /ba/-/da/ continuum, and, in the BR set, a clear increase in 
"d" responses to /ag/ when it was followed by the most /ga/-like stimuli from 
the /da/-/ga/ continuum. Two of those effects are clearly contrastive in 
nature; the third ( "b" , responses ^o /ag/ idien followed by /da/) is mysterious 
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Figures. Forward condition: labeling functions. 



153 



■CI 



GC 

/ba/-/<ia/ /da/-/ga/ 





CV STIMULUS NUMBER 

• /ad/-*"b' /ag/-*V 
'^-^ /ad/->'g" ^-^/ag/->"d" 

Figure 6. Poniard condition: VC confusions as a function of CV stimulus 
number. 
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but hsa been observed previously (Repp, 1980b). What is disturbing Is that 
all three effects were obtained only vlth one stimulus set but not with the 
other t and that no retroactive contrast ms obtain^ In a number of other 
cases portrayed in Figure 6. 

The Foniard^With^Release VC-CV Condition 

In this condition, the natHral VC and synthetic CV portions were 

separated by the original closure period that follomd the natural VC portion. 

Roughly In the center of the closure Interval, there was a VC release burst of 
varying duration and Intensity* 

^ ' Coarticulatory effects. The response percentages are shonn In Table 6. 
In contrast to the Forward condition without release bursts (Table 5)f we see 
pronounced opa rtlculatory effects here. In every Instance, there was an 
Increase in the response category corresponding to the original CV context 
(underlined in the table), even when that category was extraneous to the 
synthetic CV continuum. (The effect was significant at jg < .01 In all four 
stimulus series.) Thud, the release^ burst — and, possibly, the following 
closure Inte^fral- -provided significant Inforaatlon about the following, syll- 
able-initial stop consonant, and this Information was Integrated with (and 
sometimes dominated) tke cues contained In the synthetic CV portion. 

Closer inspection of the clata revealed that. In all four stimulus series, 
coarticulatory effects were strongest when the first stop was labial and 
weiakest when it was |relar. [These differences reached significance only on 
the two GC continua:! P(2,14) » 5*5 and 7.2, jg < .05 and jg < .01, respective- 
ly.] This finding iai unexpected because, in the earlier condition idiere the 
second stop was to be identified from the release burst alone, subjects were 
most accurate with velar bursts and least accurate with labial bursts (see 
Table 5). This reversal is curious and remains unexplained. 

Other perceptual Interactions . To compare the effects of preceding / ab/ , 
/ad/, and /ag/ ^us VC release bursts on CV perception. It would be someidiat 
misleading to plot labeling functions averaged over original CV contexts (as 
in Figures 4 and 5). Since the release bursts provided cues to the second 
stop, the /ad/ precursor, for example, which contained cues to following /b/ 
or /g/, would naturally be expected to generated fewer "d" responses than 
/ag/, which contained cues to following /b/ or /d/, or /ab/, which contained 
cues to following /d/ or /g/. On the other hand, plots of labeling functions 
for all six different VC precursors would be confusing. Bierefore, the 
relevant cixnparisons are best made in Table 6. 

For example, consider the /ba/-/da/ continua and compare the response 
frequencies for the precursors /ab(ga)/ and /ad(ga)/. Both of these contain 
coarticulatory cues for /g/; therefore, whatever different effects they have 
must primarily be due to the nature of the syllable- final stop. It is evident 
from Table 6 that, in both stimulus sets, there were more "d" responses 
following /ab(ga)/ and more "b*' responses following /ad(ga)/--a clear proac- 
tive contrast effect. Similar comparisons in the other stimulus combinations 
reveal that, with one exception, contrast effects were present throughout and 
signifipant (jg < .01) on three of the four stimulus continua. Ihus, a release 
burst between the two signal portions by no means reduced the perceptual 
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Table 6' 

Porwrd*Wlth«*BeleAM ponditloht Identification of. Slsmthetlc CV 
gyllablee in the Context of Different Natural VC* Pdrtione 
dat Include the Orlglnal Closure and Release Burst 



VC context 



/ba/./da/ /dV-Zga/ 
continutm ; oontlnuum 

Bi^sponse to CV Portion (percent) 



"4 • 

\ 



GC 

ab(da) 
ga) 

- ^(ba); 

ag(ba) 
ag(da) 



ab( da) 
ab( ga) 
ad(ba) 
ad(ga) 
ag(ba) 
ag(da) 



"b" 


« *'d" 




"b" 


"d" 


"g" 


37 


4e 


15 \ 


0 


57 


43 


26 


27 


. £7. 


0 


27 


73 


^2 


24 


V4 


6 


33 


6\ 


43 


6 • 




1 


10 


89 


52 


46 




11 


59 


29 


41 


•57",";' 


• ' 2 


1 


72 


27 


49 


50 




5 


53 


42 


40 


34 


26 


4 


26 


70 


11 


22 


3 


24 


22 


54 


55. 


23 


22_ 


10 


18 


72 


il 


52 


1 


8 


65 


27 


29 


70 


' .1 . 


1 


§1 


18 
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iotaraotion batman thaa* Tttis ^u^gaata that the contrast Is not a purely 
auditory affact. 

♦ 

Baaulta of acouatlc analyaas of tha atifflull are presented In the 
Appandlx. 

^ Discussion 

Tha praaent experiiient, though canpLex in detail r permits some fairly 
stralghtfprvaxd aonclusions. A summary af the basic findings is presented in 
Tible 7. Tha threa rows of this table represent the Backward, Pornard, and 
Ibmard-With-Release conditions, respectively, and the three columns show the 
percentages of trials on vhich tihe target consonant (C^) was classified into 
tha category represented by the context (C^), into the category represented by 
the excised stimulus portion (C2), and into the category not represented in 
the original utterance (C3). The following conclusions may be drawn: 



Table 7' 



Summary of VC-CV Data, 



Condition \^ 


Stimulus 


Ci pe 


rceived as 


( percent) : 








C2 


O3 


Bactcward 


VCi-(VC2)CiV 


29.3 


55.5 


35.2 


Poniard 


VCi(C2V)-CiV 


28.2 


35.8 


36.0 


Porward-Wi th-Re lease 


VC,*b(C2V)-CiV 


17.7 


51 .0 


31.3 



(1 ) In sequences of two nonhomorganic stop consonants, there is no 
coarticulatory information in either the VC or the CV formant transitions. 
(In Table 7, tha percentages of C2 and C3 responses do not differ in either 
the Backward or the Forward condition.; While these negative perceptual 
findings laava open the possibility that coarticulatory information was 
present but was not utiliaed by listeners, the acoustic analysis suggests that 
there simply ware no coarticulatory shifts in stop place of articulation. 
These negative findings override the few suggestions of coarticulatory effects 
in the perception of isolated VC and CV stimuli. 

(2) There is coarticulatory infomation about a following stop in the VC 
release burst, and this infomation can be used by listeners cfespite the 
relative weakness of the burst as an acoustic event, (in Table 7f third row. 



157 17 J, 



Pirdtptuil AiMea«Dt of Coartioulatlon in S»qMno«s of IVo Stop ConMnants 



tb» pftrorat«<« of C2 responMS oan b« Men to be aubstantiftlly higher than 
that of C3 reaponsea.) Although the burat derived from the releaae of Ithe 
occlusion for the first stop, its apectrus is apparently influenced by the 
configuration of the articulators as they move tomrda (or have already 
attained) the occlusion for the second stop. 

(3) Both inroactive and retroactive contraist effects ve re observed, even 
though the InstruotiORS encour^ed ihdependent proceaaing of VC am^^GV 
portions* (As can ^e.seen in ^Table 7,^ reaponsea tere leea frequent;^ than 
either 03 or C3 reaponsea in all three ^conditiona.) This suppo^pts earlier 
findings and suggests that the contrast effects are perceptual i^^ origin, not 
due to to sons kind of response" bias I y 

The aajor theoretical question addressed by this paperj vas Wiether the 
perceptual contras^y^f^li'ats in VC-*CV sequences might be caused by liaten|9rs' 
compensation for colTticul^atory ahifta in the placea of articulation of 
^Jacent atop consonants. The present results suggest a negative ansver* 
Diis leaves tvo possible explanations of the contrast effects. 

1? *. " ' I 

ODe explanation rests on the assumption of complex auditory interactibns 
between the spectral cum for. place of articulation on either side of the 
cloaure interval. ^ This hypothesis cannot be ruled out at present, and ve need 
to learn a lot more about the perception of complex auditory signals before it 
can be fully evaluated. The present data do suggest that acoustic stimulus 
properties Influence the magnitude of contrast effecta, but theae influencea 
may he supetrimposed on a basic effect of a different origin. 

The alternative explanation for this effect is that the silent cloaure 
interval, rather than merely separating the VC and CV portiona, provides 
Infoxmation about/ the number of atop consonanta involved. According to this 
hjfpo thesis, Tistehers possess tacit knowledge about the temporal properties of 
speech and,, specifically, of the fact that the closures of two-stop sequences 
are longer than those of single stops but shorter than those of double 
(geminate) stops (Vestbury, Hote 1; however, see also Raphael, Doman, A 
Isenberg, 1979)* In this View, then, contrast effects do not derive from some 
perceptual interaction between the VC and CV portions, as a psychophysical 
view of speech perception would have it; rather, they are assumed to derive 
from the perceptual integration of inforaation provided by the VC and CV 
formant transitions and by the closure interval itself. In other words, they 
deriv# from the fact that listeners interpret speech signals with reference to 
their kx^owledge of the normative properties of speech. This, after all, is 
the essence of phoniptic perception. 
•■- ♦ ■ . . 

The basic principles of phonetic perception also account for a variety of 
other context effects in speech perception (see Re^p, 1982). However, the 
precise causes of different context effects may vary. The effects of 
preceding fricatives and liquids on stop consonant perception still suggest 
coarticulatory dependencies, for, in these cases, the duration of the stop 
closure interval seems to carry little inforaation about changes in place of 
articulation, even though it may constitute a secondary cue to specific places 
of stop articulation (Bailey A Summerfield, 1980). In the case of two-stop 
sequences, however, the information conveyed by closure duration seems to be 
the major cause of (what has been mistakenly believed to be) perceptual 
contrast. 
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Tb9 perceptual effects of VC release bursts obtained in this study have 
no direct iapllcatlons for the Interpretation of "contrast" effects, save for 
the fact that contrast persisted In the presence of VC release bursts'^ which 
further reduces the plausibility of any simple auditory interaction hypo- 
thesis, fihe coartlculatory Infomation carried by VC release bursts was due 
not to articulator^ accommodation (l«e«, shifts In place of articulation) but 
to articulatory transition and overlap. Hhe perceptual salience of the 
acoustic changes wrought by this form of coartlculatlon Illustrates once again 
the multiplicity of cues to stop place of articulation (cf. Ooman A Raphael, 
I960) and listeners* exquisite sensitivity to the detailed spectral properties 
of the speech signal. 
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APPBKDIX: ACOUSTIC AMALYSBS 

Detailed acoustic analyses of the stiauli vera conducted to reveal the 
sources of the coartioulato^ effects in the lbniard->Vith-Release condition. 
Ihe results of these analyses are reported below. 

Teaporal Measurements ^ 

Method . The durations of (a) the closure interval preceding the VC 
release burst (VC closure), <b) the VC release burst itself, (c) the closure 
interval Iblloidng the VC release burst (CV closure) were measured on a large- 
scale oscillograi^ic display to the nearest millisecond* There was generally 
little imcertainty about the beginning of (b) and about the end of (c). ^e 
precise beginnings of (a) and (c) weris someiriiat more difficult to define 
(cf. Figure 1), but an attempt was made to follow consistent criteria: a 
significant reduction in voicing amplitude for the oi^set of (a), and a return 
to near-baseline energy for the onset of (c). Itie sum of the three measures 
yielded the total closure duration. j 

* f 

Statistical ^|e8ts were conducted on each of tlj&e four sets of measures 
separately for each speaker, using the between- token variability as an error 
estimate. Since the places of articulation of thie first and second stop 
consonants (C1 and C2) were not orthogonal factoiTs, their effects on the 
segment durations of interest were evaluated by me^ns of simple F-tests for 
planned comparisons. Effects of CI were assess|^d by comparing pairs of 
utterances in wh^ch C2 did not vary (/adbaZ-Zagb^/ , /abda/-/agda/ , /abga/- 
/adga/) , and effects of C2 were assessed by comparing utterances in which CI 
was constant ( /abda/-/abga/ , /adba/-/adga/ , /agba/-//agda/) . 

Results and discussion . Mean durations, standard deviations calculated 
from the five (occasionally four) tokens of each utterance, and the results of 
the significance tests are displayed in Tfeible 8.. The results are in close 
agreement 1^th.^earlier measurements of similar utterances reported by Repp 
(1980b). I . ^ 

The duration of the VC closure was affected by the place of artictilation 
of CI, being longest for /b/, but not by that ok C2. Thus, this portion of 
the closure did not convey any significant coarticulatory information. 

The duration of the VC release burst also depended primarily on CI, being 
shortest for /b/. It seemi that this variable, too, contained little specific 
infomation about C2. The shorter durat|^on of labial bursts may account for 
the lower ^C2 recognition scores from coarticulatory cues when CI was labial 
(Table 3), and it makes the large effect of labial bursts in hybrid VC-CV 
utterances (Table 6) seem even more curious. 

The duration of the CV closure, on the other hand, was strongly 
influenced by the place of articulation of C2, being longest for /b/, 
especially in BR'a utterances. Tkua, this portion of the closure may have 
provided a cue to the place of articulation of C2 in the VC-CV hybrid stimuli. 
However, note that the strongest coarticulatory effects in VC-CV perception 
ibre obtained idien CI was labial*, whereas T^ble 3 shows that precisely in this 
case the 'CV closure provided little inforaation about* C2. This observation 
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Table 9 



Average Suratlona of VC Releaae Burets and Closure Intervals 
in Mtlliseconds (Standard Deviations in Psirentheees) 



Utterance 

GC 

abda 
abga 
adga 
adba 

agba 
agda 

Average 
ER 

abda 
abga 

adga 
adba 
agba 
agda 

Average 



VC closure VC release ^urst CV closure Total closure 



{ 



77 


(20) 


" 16 


(4) 


45 


(5) 


138 (21) 


71 


(11) 


20 


(5) 


44 


(10) 


135 (18) 


52 


(17) 


26 


(8) 




(12) 


116 (11) 


53 


(13) 


20 


(10) 


L56 


(19) 


126 (12) 


60 


(13) 




(9) 


59 


(12) 


138 (17) 


56 


(5) 


_ L29 


(9) 


53 


(9) 


138 (6) 



62 (10) 

f 

66 (6) 
68 (15) 
57 (9) 
57 - (4) 

41 

53 (7) 



r 57 (4) ^ri9 c 
L 41 (12) * L29 (1 



57 (7) 



(5) 




49 


(8) 


132 


(9) 


(10) 




67 


(5) 


147- 


(5) 


(2) 




"71 


(6) 


149 


(9) 




* 










(7) 




-57 


(11) 


p 41 


(10) 










¥¥¥ 


(7) 


(2) 




93 


(5) 


Li 69 


(13) 




r89 


(10) 


159 


(8) 








(4) 




L75 


(6) 


150 


(9) 


(4) 




75 


(6) 


150 


(9) 



• ^ < .05 
2 < .001 
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•rguas itrongly for spectral propartiea of the VC release burst as the 
prlnclpitl source of c oar tic til a to ry infomation* 

It alght be bypothesiied that lAatever spectral cues the bursts contain 
Hill be more effectively perceived the longer a burst lasts. Tb test this 
hypothesis, the burst duration measureoents for the five (sometimes four) 
individual tokens of each utterance ifere correlated with the average response 
percentages in the relevant category to the same tokens in the Fonmrd-With- 
Helease condition. There was some relationship in the GC set (average r - 
0.45) but not in the BR set (average r - -0.05), suggesting that long bursts 
convejied only little more infonnation than short bursts. 

Amplitude Measurements 

An integrated measure of VC burst amplitude was obtained from the first 
15 msec of each burst. The burst amplitudes showed surprisingly little 
relation to the burst durations (r - -0.17 in the GC set; r - 0.38, jg < .05, 
in the BR seti) . For both speakers, labial bursts were significantly weaker 
than alveolar and velar bursts, and while GC produced stronger alveolar than 
velar bursts, BR did the opposite. Uiese differences are obviously correlated 
^ with the percent-correct scores for C2 shown in Table 3* Correlations 
computed over tokens within each utterance revealed moderate relationships 
between btirst amplitude and C2 recognition in the Porward-With-Re lease condi- 
tion (average r - .50 in the GC set; average r ^ .30 in the BR-set) . TH\ia 
suggests that listeners were able to extract more coarticulatory infomation 
from strong bursts than from weak ones. That the relationship was not very 
strong, however, is further suggested by the fact that BR's bursts were 
generally much weaker than GC's; nevertheless, both sets of stimuli led to 
nearly equal perceptual effects (Tables 3 and 6). 

Spectral Measurements 

Method . The spectrum of thijl initial 15 msec of each VC release burst was 
obtained using an FFT program with a 20-msec Hamming window whose left edge 
was placed 5 msec before burst onset. No pre-emphasis was applied. The 
resulting spectra were smoothed by linearly averaging over approximately 400 
He, moving across the frequency scale in steps of roughly 20 Hz. F9r purposes 
of graphic display, the spectra were amplitude-nonnaliaed , and average spectra 
were computed from all tokens of a given utterance. Estimates of the fonnant 
frequencies in the vocalic portions preceding and following the closure had 
been obtained previously using an UA-A6 Federal Scientific Spectrum Analyzer 
(see Repp A Ibnn, 1962, for details of this method). 

Results and discussion . Figure 7 compares the average spectra of release 
bursts for the same CI in the context of different fdllowing stops. All burst 
spectra contained significant amounts of energy in the region of the first 
formant (F1), irtiich may indicate the presence of residual voicing during the 
closure. These F1 peaks were not sensitive to C2 context, however. In 
contrast, it can be seen that coarticulatory infomation about C2 resided in 
the second- fomant (-F2) regiop, between .1000 and 2000 Hz. The most striking 
difference occurred for velar bursts: /gCb)/ bursts had F2 peaks at consider- 
ably lower frequencies than did /g(d)/ bursts. Si.milarly, /b(g)/ bursts had 
P2 peaks at lower frequencies than /b(d)/ bursts. No such difference is ^ 
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Table 9 



Average 12 frequencies in VC Release Bursts and in Vocalic Portioni^ 
Immediately Preceding and Ibllowing the Closure, in Hz 
(standard Deviations in Parentheses) 



Utterance 
GC 



VC offset 



VC burst 



CV onset 



BR 



abda 
abga 


1444 (65)' 
1412 (46) 


ri734 
***l * 

Li 383 


(77) 
(106) 


1868 
-ri772 


(46) 
(59) 


adga 


, 17.32r<'53) 


"1^0 


(138) 


Li 860 


(35) 1 


adba 


1728 (36) 


2070 


(241 ) 


ri5i6 

** 


(26) j 


agba 


1652 (27) 


r1 51 1 
Ll^86 


(138) 


Li 424 


(71) 1 


agda 


1652 (33) 


(99) 


1840 


(28) 1 



abda 
abga 


rl012, (23) 
Li 084 (30) 


*** 


-1 586' 
-1 183 


(167) 

(ro7)> 


1416 
"1 400 


(61) 
(42) 


adga 


1296 (43) 




1570 


(99) 


-1 532 


(27 )| 


adba 


t292 (30) 




1629 


(58) 


1 100 


(47)' 

1 


agba 
agda 


1276 (61 ) 
1316(36) 


*** 


-1 274 
L-1658 


(9^7) 
(41) 


1084 
1415 


(56) 
(38) 



•p < .05 
»» p < .01 

p.< .001 
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evident for /d/ bursts, butVd(g)/ bursts had a pronounced energy minimun 
around 1000-1200 H«, whereas /d(b)/ bursts did not. 

V 

ll 

liable grists, in its center coltan, the average P2 peak frecluencies of 
the various bursts. Despite the small number of tokens and considerable 
variability, the effects of 62 context on labial and velar bursts were^ghly 
significant in t^tests. At the same time, of course, the F2 fr^uencies 
reflected the place of articulation of CI, being lowest for /b/ and highest 
for /d/. (The statistipal results for the effects of CI, most of idiich were 
highly significant, are omitted from the table for the sake of clarity.) Note, 
however, that the effects of 02 on the P2 peak frequency were at least as 
large as those of CI. 

Table 9 cdso lists, for comparison, the frequencies of F2 in the voiced 
signal portions immediately preceding and following the closure interval. It 
is evident that, in general, the F2 frequency of the burst did not lie on a 
trajectory between the VC and CV frequencies. It can also be seen, that, idiile 
VC and CV frequencies primarily reflected the place of articulation of 01 and 
02^ respectively, there were s&me significant coarticulatory effects. One of 
thm, a lower Onset frequency o^ F2 in /(ab)ga/ than in /(ad)ga/, was obtained 
for both speakers. B>wever, these coarticulatory variations were apparently 
not effective as perceptual cues (Tables 4 and 3)* 

. Given the^e systematic spectral differences, some relation might be 
expected between F2 peak frequency and listeners' responses in the perceptual 
experiments. For example, /g(b)/ bursts with very low F2 frequencies should 
lead to especially liigh proportions of "gb" redpon^es, and /g(d)/ bursts with 
very high F2 frequencies should lead to the highest proportions of "g^*' 
responses. Unfortimately, this hypothesis found no support in a correlational 
analysis. This leaves open the question of i^at aspect of the VC release 
binrsts actually conveyed the coarticulatory information. It may have been 
some more complex . spectral propert]^ than the F2 peaks considered here. This 
is also suggested by the fact that alveolar bursts, which did not vary 
3ignific€mtly in F2 frequency, did transmit coarticulatory information. 
Farther research will be required to determine the precise nature of the 
relevant cues. 
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D. U« Vhalen 



Abstract* Vhen an [s] or [S] fricative noise is combined with 
yocaXic fonaant transitions appropriate to a different fricative, 
the resulting consonantal percept is usually that of the noise. To 
see if the miaaatch affects processing time, five experiments were 
run. Three experii^ents examined reaction time for identification of 
[s] and [S], as well as the whole syllable (in one experiment) or 
only the vowel (in the others). The stimuli contained either 
appropriate or inappropriate formant transitions, and the vowel 
information in the noise was either appropriate or not. Subjects 
were significantly slower in all tasks in identifying stimiili with 
inappropriate translTtions or inappropriate vowel information. 
Similar results were obtained with stop- vowel syllables in %rti'ich the 
release bursts of syllable initial [pj and [kj were transposed in 
- ' syllables' eonta^iriing the vowels [a] and [u]. In the fifth experi- 
ment, enough silence was introduced between the initial fricatives ' 
f and vocalic segment for the vocalic formant transitions to be 
* perceived as a stop (e*g., [stu] from [su]). Mismatched transitions 
then had a much reducjed effect on reaction time, while mismatches of 
vowfel quality slowed Identification even more. The results indicate 
that listener^^ take into^ account all available cues, even when the 
vjri&onetic judgment seems to be based an only some of the cues.. 

. , INTRODUCTION ^' ' - 

It is well known that information about a phone is temporally spread in 
the speech signal. It is usually, impossible to isolate one piece of the 
signal and identify it as one single phone. Even when such a segmentation 
results in a stretch of sound that is identifiable as a single phone, 
information about neighboring phones usually remains. The vowels of consonant 
vowel syllab-les, for example,, can be identified at better^ thar^ chance levels 
from the excised stop consonant release bursts (Blumstein A Stevens, 1980; 
Kewley-Port, 1980; LaRiviere, Winitz, A Herriman, '1975b) or from the excised 
fricative noises (LaRiviere, WinitE, A Herriman,* 1975a; Yeni-Komshian A Soli, 

1981). ' 

*, - ' 

The vowel information in stop bursts and frictions is quite weak. This 
is evident in our saying tljat these vowels can be identified at a "better than 
chance" level. . If the percept were strong, the vowel would be as easy to 
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identify trim the part as from the lAiole syllable. There is not that much 
infomation available* Qather than constructing a vowel percept, the subject 
can infer idiat vovel must have been present. 

The voiidl information in a stop release burst is also not a strong enough 
vowel cue to override the infomation in the vocalic segment. If a release 
burst from [pa]» for example^ is replaced with one from ipu], our perception 
of the vovel does not change, although there is vowel information in the 
burst. An artificial mismatch of that sort, in idiich a cue is put in a new 
environment in idiich its cue^-value is not sufficient to change the phonetic 
percept, will be called a eubcategorical phonetic mismatch . The cue that gets 
overridden in that way will be called a mismatched cue * There are three ways 
a listener can treat a miamatcheSi cue: 1 ) she can reject it, so that a non- 
speech click, pop, whistle, etc«« is perceived in addition^ to the speech; 2) 
she can integrate it with the overriding cue jin such a way that wi thin- 
category "variaHon is perceived (as could be determined with a discrimination 
test); 3) or she can ignore it. The experiments described in this paper will 
show^that miamatched cues impose ; a processing load. Thus the "act of 
ignoring** a cue (or possibly within- category variation) takes time. This 
supports the notion that listeners are sensitive to all the information they 
gather and attempt to incorporate it into the percept. 
• * 

Note that in order to know whether to accept or reject a mismatched cue, 
the listener must know what a possible speech sound is. If she treats the cue 
as non- linguistic noise, it must be because she could not make linguistic 
sense of the auditory pattern* In extreme casesy there may be gross auditory 
discontinuities Mlsmatctydd cues, in similar but ajppropriate contexts, can be 
integrated* Thus it is not sufficient to say that mismatched cues are not 
si>6ech-like; given the proper environment, they are quite natural and provide 
phonetic information appropriate to the speech sounds they were originally 
produced with. It requires a complete knowledge of phonetic possibilities to 
know whether a cue is in its appropriate environment or not. 

Two kinds of mismatched cues were studied in the present experiments: 1} 
vowel information in fricative noises and stop consonant "r el e'ase biu*8t8, and 
2) the place of articulation information in atop bursts and in vocalK; formant 
transitions of vocalic segments occurring with fricatives. The infomation 
about a fricative' s place of articulation in fomant transitions has been 
shown to influence phonetic identification when the friction cue is ambiguous 
(Harris, 1958; Mann A Repp, 1980; Whalen, 1981 )• Uhambiguous fricative 
noises, on the other hand, seem to override mismatched transitions completely 
in following vocalic segments. The perception of vowels following frictions 
that were originally produced with other vowels is similarly unaffected by 
that mismatched infomation. 

A similar situation sometimes occurs with syllable- initial stops. If we 
exchange release bursts from stops produced at different places of Girtlcula- 
tion, the bursts often detem*ine the place of the resulting stop percept. 
Other time^, however, the transitions will be the deciding cue. Sometimes the 
perceived place- will be different from both that cued by the burst and that 
cued by the transitions ( Fischer- J^Jrgensen, 1972) . (Unlike the fricative 
noises, ^no stop burst, it seems, provides an unambiguous cue. to place Xn all 
vocalic contexts; cf. Blumstein 4 Stevens, 1980; Doman, Studdert-Kennedy , 4 
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Eaphaei; 1977)* Yet another parallel occurs with medial stops. ^ If the 
transitions into and out of medial stops conflict, the second (opening) set 
usually determines the percept vlth no audible contribution of the closure 
transitions (Doxman> Raphael^ LLbermant A Bepp, 1975; Fujimura, Hacchi/ft 
Streeter» 1978). Stimuli with such conflicting transitions are difficult , to 
discrlsinate from stimuli ifith matched transitions (Repp, 1977)* 

In many stimuli vith mismatched cues, then, no overt ambiguity results^ 
and the mismatch escapes conscious detection. However, it could be that the 
assignment to a phonetic category was in fact slower when some cue or another 
was inapproprjiate. Delays in identification have been shown in stimuli with 
overt ambiguities (Pisonl 4 Tash, 1974; RepPt 1981b). An alternative view 
hypothesiees that the listener's peirceptual system would treat the overriding 
cue for a ^phohe as sufficient and ignore the ''subcategorical" . mismatches 
completely. In this case, a listener would be able to identify, say, an 
alveolar fricative equally fast whether the transitions of the vocalic segment 
it occurred with wez^ appropriate or not. 

The first view presumes that the perceptual mechanism 'tries to include 
the phonetic value of each cue in the percept, whether that cue is strictly 
necessary to the identification or not. The latter view presumes that the 
perceptual svstem att^pts to make a justifiable phonetic assignment as soon 
as possible (as in Blumstein 4 Stevens, J 980; Cole & Scott, 1974; Klatt, 1979; 
Steveiis, 1975). The former projxjsal will be called the "integrating" account, 
since tb^ proposed mechanism attempts to integrate (over time and frequency) 
all information* reaching it into a unified percept (see LLberman, 1979; 
Libennan & Studdert-K^nnedy, 1978; and Repp, 1982, for recent reviews of th^ 
relevant literature). The latter will be called the "disposing" account, 
since its mechanism attempts to dispose of each portion of the speech signal 
(by passing a phonetic judgment on to another part of the system), as it is 
received. 

Consider first the case of mismatched cues that precede the Overriding 
cue in the speech' signal. Several studies have shown that such mismatches 
slow decision time. Subcategorical mismatches of transitions into medial stop 
resulted in slower decision times in a speeded lexical decision task (Streeter 
4 Nigro, 1979). (The effect only appeared for words, not for nonwords.) 
Kartin and Bunnell (l98l) have shown that identification of final [i] and [uj 
are slowed when a preceding fricative or fricative-stop cluster was origin^ally 
produced before the other vowel . Later studies (Martin 4 Bunnell, 1 982) 
examined vowel to vowel coarticulation with similar results. 

Ttie integrating account does not need any additions to explain these 
results. A listener need only notice that conflicting cues are present, and 
she will attempt to integrate them into .the phonetic percept. That these cues 
can provide information is shown by their determining the percept when the 
(normallji) overriding cue is ambiguous. The disposing account can, with some 
additions, also explain the^stop data by assiaming that a phonetic decision is 
made on the basis of the closure transitions, but that the decision is not 
firm enough to allow it to generate the phonetic percept. When the opening 
transitions conflict with the decision based on the closure transitions, it 
would presumably take ^ome extra time to set up another phone as the percept. 
The mechanism of the disposing ♦ account must also generate a (preliminary) 
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voml percept baeed on the friction (to account for Martin A Binnell's, 1982, 
data) . 

Tbe situation that diatitiguishes theae theories occurs when the conflict- 
ing cues follow the overridinj^ cue. The integrating account predicts that 
such cues wlll^be as slowing as those that precede the overriding^ cue. An 
initial fricative followed by inappropriate transitions should give longer 
identification times. The disposing account, on the other hand, predicts no 
delay due to following mlslnfoimatlon, since the correct decision would 
already have been made. 

Figure 1 is a comparison of the pr^ictlons of the disposing and 
integrating accounts. Vhen the mismatched cues precede the overriding cue, 
both theories predict that mismatches will slow response time. The disposing ' 
account assumes that the Identification will take longer to reach criterion 
level, uhlle the integrating account assumes that the integration of conflict- 
ing information takes longer than integrating compatible infomation. (The 
figuire is oversimplified by assuming that integration does not begin until all 
the cues have been received^; this is done for convenience of display only.) 
When the mismatched cue follows the overriding cue, the disposing t)ieory 
predicts identical times for both matched and mismatched versions of the 
stimuli, while the integrating account predicts a delay for mismatches. 

Ihe present paper reports five experiments exfonining speeded identifica- 
tion of fricatives, vovpls, stops, and whole syllables with and without,, 
mismatched cues. In the first experiment, the overriding cue came after the 
conflicting cue. This will confirm the other results mentioned above. For 
three of theia, however, the overriding cue, came before the conflicting cue. 
Tti^ integrating account predicts a delay, while none is predicted by the- 
disposing account. In the last experiment, the transitions of the fricatives- 
vowel syllables were allowed to affiliate with a different phone (i«e., a 
stop)^ by inseitting silence between the noise and * the vocalic segment. The 
integrating account predicts a reduction in the effect of mismatches here, 
i^ile the disposing account still predicts no effect. 




EXPERIMENT 1 

Experimental Procedure , 

i 

Materials . A male native speaker of English recorded ten tokens of >^cb 
of the syllables [as], [aS], [is], [iS], [os], [oS], [us] and [uS] on magn^tJic 
tape. These were low-pass filtered at 10 kHz and digitized at a sampling rate 
of 20 kite. Two tokens of each syllable were chosen so that the vocalic 
portion of all eight were of equal duration, the friction of all eight were of 
equal duration, and, of course, the original syllables and all combined 
syllables were also of equal duration. All judgments were thus given to 
stimuli of equal dtiration. A vocalic segment duration of 200 msec was found 
naturally in eight syllables. Seven were shortened by cutting off between 10 
and 50 msec from the onset of the vowel; the resulting abruptness did not 
sound unnatural. Tha^ighth vocalic portion was lengthened 20 msec by 
repeating its first jltcn piilse three times. The frictions Were 250 msec in 
duration; nine were/shortened by removing between 10 and 50 msec ttm near the 
end of the signal, 
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Once the tokens had been selected and the durations equalised, each 
friction was combined with each vocalic segment, including the one it was 
originally produced vlth. The resulting 256 stimuli fell into four categories 
of interest: 1) The vocalic foment transitions had been produced with the 
same fricative as the percept generated by the noiee ("appropriate transi- 
tions") and the vowel was the same as the vowel the fricative had originally 
been produced with ( ^'appropriate vowel"); 2) The transitions were appropriate 
but the vowel was inappropriate; 3) The vowel was Inappropriate but the 
transitions were appropriate; and 4) Both th^e transitions and the vowel were 
inappropriate* ^ 

Some mismatches of vowel and the vowel :(.nformation In the friction gave 
rise to perceived [i] or [u] offglides on tjie vowel (as detailed in Whalen, 
19B2). liius there is a mixture of cue status here; some are mismatched, and 
some are reinterpreted as an added phone* Vhalen (1982) showed that the 
transitions did not contribute to the diphthong percepts. Thus the mismatched 
transitions are clearly subcategQrical mismatches. The effect of mismatched 
vowel quality was not as readily attributable to aubcategorical mismatahes, 
since not all of the vowel quality cues were ignored. 

Bach session consisted of four blocks of: stimuli. 'Bach block contained 
128 trials, plus four "warm-up" stimuli at the beginning (which were not 
ii^llied in the results). One token of each stimulus occurred once within the 
first two blocks, and once within the second, two; the order was otherwise 
random* Tiie stimuli -were recorded on ojie channel of an audiotape whilef on 
the other channel, a timing tor^e was recorded simultaneously with the ohset of 
the stimulus. The inter- stimulus interval was' 2300 msec. 

' Subjects . Two groups of 'subjects^ were tested, expert and naive. The 
expert listeners' were 10 researchers at Haskins Laboratories, all of whom were 
phonetically trained. Two were left-handed. The naive subjects were 10 young 
adults, all native speakers of English ,who had veiunteered for experiments at 
Haskins LaboratorieSi and. were paid foi^' their participation.' Orie was left- 
handed. 

Apparatus . Subjects were seated in a quiet, room and heard the stimuli 
over Telephonies TDH-39 headphones. Their responses wiere made^ by pressing one 
of two buttons on a panel in front of them. The "s" response was on the left 
and the "eh" response on the right. During the test, if the answer was 
correct and within a predefined time^lirait (longer than 100 mseC and shorter 
than one second), a small light on the control box in front of them lit up. 
Their' response time, answer, and the eorrectneds of that answer went into a 
computer file after each trial. 

- Procedure \ . 

The subjects were instructed to identify the fricative as quickly as 
possible: They were told to expect a few mistakes, but to slow down if they 
made too many. The feedback light was explained to them. Thirty stimuli were 
run but not acored to give them practice. After it had. been determined that 
there were no questions, 'two blocks were run with ''a thirty-second pause 
between, followed^ by a short break. The next two blocks, separated by a 
thirty-second pause, finished the session. 
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Results , * 

Only correct rssponses within the specific time limits mre included in 
the snaljais of the results. Thus resionsed that were too long (over one 
socond) or too short (under 1CX) msec) were counted as mistakes. This gave an 
overall, error rate of 5- 4J<. 

» - ■ ■ • 

As can be seen from figure 2» inapptppriatepess of transition slowed the 
subjects* identifications, F(l , t8)-93. 22^, 2 ^ ' ^® of the 

graph show mean identification time, respectively, from ^.eft to right, for I) 
the syllables in idiich both transition and vowel were matched, 2) those where 
the transition ims mismatched but the vowel was matched, 3) those where the 
transition was matched but the vowel -was mismatched, and 4) thos6 syllables 
idiere both transition and vowel vere mismatched. On average, subjects were 24 
msec faster in their decision when the transition was appropriiate (means of 
516 msec vs. 340. The inappropriateness of the vowel also slowed the 
identification 4imes, P(3, 54)«*5. 494, 2 ^ , by 6n average^)f ,9 msec . The 
effect of /appropriateness of transition is seen in the difference betweenj^Ahe 
first two bars ' as well as the difference between the* second two. The efifect 
of appropriateness of vowel is seen in the comparison of the first and rthird 
bars and of the second and fourth bars. Further, thfese two effects yrere 
independent, F(3,54)"0.918, n.s., for the interaction. 

T3h* experts were significantly faster than the naive subjects, 
F(1 , 18)»5.446, 2 ^ The means were 528 and 588 msec, respectively 

Tmeasured from the onset of the vowel). The ^ interactions with the two 
appropriateness factors were not significant, though, indicating that the 
effects are independent of linguistic sophistication. 

The vowels were chosen to contrast in rounding : ( /o,u/ vs. /a,i/) and 
(relative) height (/i,u/ vs. /a,o/). Therefore a second ai)fl ye is was per-, 
formed in which the appropriate vowel factor was split into apprppriate height 
(where the height of the .vowel matched tfte height of the vowel that the 
fricative was originally produced with) and appropriate rounding. Appropriate 
rounding was significant as a main effect, P(l , 18)»4. 625, .< -^5, but 
appropriate height was not, P( 1 , 18)»2. 076, n.s. Appropriateness of the 
transition did not interact with the appropriateness of the vowel for either 
rounding or height, £( 1 , 18)«1 . 696, 1.129« ^ The two types of vowel appropriate- 
ness did interact with each other, F( 1 . 18)-1 7. 846, j < .001. Ttie syllables in 
which both vowel features were appropiiate were identified faster than those 
where one or both were mismatched. Further work is needed to determine the 
limits of vowel information in fricative noise; the current results simply 
show that it is there. >^ 

Discussion 

The strongest effect from the first experiment is that inappropriate 
vocalic formant transitions slow\ identification of a following fricative. 
While this result makes sense, it is perhaps a bit unexpected. One mt^ht 
assume, as did Cole and^ Scott (1973* P- 448), that the transitions serve only 
to keep the fricative noise from ''streaming" off and 80un*ing lite nonspeech. 
If the transitions are only an auditory event that leads the hearer to expect 
a fricative, then any transitions should do. Thus the listener Could ignore 
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th« ia«3e infomatlon In the transitions. If this "auditory** integration were 
oensitive to the place of the fricative, then the transitions would in fact be 
giving place information and thus be a cue. The present results indicate 
that, indeed, place inforaation in the transitions is taken into account even 
when it is overridden by the A ore salient friction cue. 

The vonel effect is less surprising and can be interpreted in terms of 
coarticulation. We nould expect, on articulatory grounds, that rounded vowels 
would have a large effect on the spectrim of the fription. Studies of vowel 
information in frictions have shown this consistently (LaRiviere et al., 
t975a; Yeni-Komshian 4 Soli, 1981; cf. Whalen, 1982). In the present results, 
mismatches in rounding did indeed alow identification, while mismatches in 
height did not. Ohis result must be qualified, however, since the differences 
in height were not as systematic as those of rounding. 

In general, sub-categorical phonetic mismatches Can slow identification. 
The next experiment was designed to see if subjects could avoid such delays 
when the fricative occurred first in the"^ utterance, that is, when the 
overriding cue for the fricattve preceded the mismatched cue. 

EXPERIMENT 2 

Experimental Procedure 

. Materials . A male native speaker of Eiiglish recorded ten tokens of each 
of the syllables [sa], [6a], [su] and [Su] on magnetie tape. These were low- 
pass filtered at 10 kHz and digitizpd at a sampling rate of 20 kHz*. Two 
tokens of each syllable were chosen so that the friction would be equalljr long 
In all eight. A duration of 180 msec was found naturally in seven syllables; 
the eighth was produced by removing 50 maeS from a token, with a longer 
friction duration. The vocalic segments varied in duration, ranging from 255 
to 221 msec for [a] and 225 to 188 msec for [u]. ^ 

Cttie other manipulation was carried out on the stimuli in an attempt to 
see if the subjects were categorizing the fricative on the basis of the 
fricative noise alone. Since the noise is the overriding que, a fricative 
Judgment could be made on it alone. If subjects make their decision rapidly 
enough, then shorlTening the friction should haVe no effect on the reaction 
time. Since the initial portion of the noise unambiguously specifies the 
fricative, the response can be initiated without waiting ^ for the vocalic 
segment. Alternatively, if reaction times .vary' with the duration of the 
friction, this would indicate that subjec^ts waitT at least until the start of 
the vocalic segment before initiating their response. A shortened version of 
each frictiop was made by excising 50 ilasec from the middle of the noise. This 
left the onset and offset amplitudes intact. This, procedure caused no audible 
discontinuity and generated no affricate' percepts. 

To make sure that there would be occasions on which the subjects would be 
forced to wait for the vocalic segment before responding, two conditions were 
run. In the first, only the fricative was identified; in the second, the 
whole syllable. When Identifying the whoje syllable, the subjects must wait 
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for th0 vocalic aegnent to occur before they can make their Judgment. We can 
then tell whether inapproi»riate cues havf an effect in all cases, only when 
the ooaflictlng cues auet be waited for, or never. 

Once the tokens had been selected and the shortened frictions made, each 
ftlction ma coobined with each vocalic segment. This gave 2 (short vs. long 
ftlctlon) X 2 ([a] vs. [S]) X 2 ([a] vs. [uj) x 2 (vowel that the friction was 
I^oduced with is appropriate to the vowel in the coiaibined syllable 
vs. inappropriate vowel) x 2 (vocalic foment transitions are appropriate to 
the friction vs. inappropriate transitions) x 2 (tokens of each vocalic 
segment) x 2 (tokens of each friction) - 128 stimuli. 

BEicb session consisted of four blocks of stimuli. Each block contained 
one repetition of each of the 128 stimuli, plus four "warn-up" stimuli at the 
beginning (vhlch were not tallied in the results). The stimuli were random«> 
Ised within blocks. Test stimuli were recorded on one channel of an 
audiotape, while a timing tone was recorded simultaneously on the other 
channel. 13he inter^stimulus interval was 2500 msec. * 

Subjects . The subjects were 20 young adults, all native epeakers of 
Bagllsh idio had^ volunteered for experiments at Hasklns Laboratories, and were 
paid for their participation. Ten were the naive subjects from Ebcperlmeat 1. 
TKree were left-handed. 

Apparatus . Subjects were seated in a sound- attenuated booth and heard 
the stimuli over TIXI-39 headphones. Their responses were made hy pressing one 
of 2 (condition 1 ) or 4 (condition 2) buttons on a panel in front of them. In 
cotulition 1, the "s"' response was on the left and the "sh" response on the 
right. In condition 2, the "sa" and "sha" responses were on the left, with 
"sa" being directly above "sha." The "su" and "shu" buttons were arrfinged 
similarly on the right. During the test, if the answer was correct and within 
the stated time limit (longer than 100 msec), and shorter than one second (for 
condition l) or one and a half seconds (for condition 2), a mall light on the 
control box in front of them lit up. Their response time, answer, and the 
correctness of that answer went into a computer file after each trial. 

Procedure ^ ^ 

The subjects were instructed to identify either the fricative (condition 
1) or the idxole syllable (condition 2) as quickly as possible. They were told 
to expect a few mistakes, but to slow down if they made too many. Thirty 
stimuli were run but not scored to give them practice. After it had been 
determined that there werer no questions,, two blocks were run with a thirty- 
second pause between, followed by a short break. The next two blocks, 
separated by a thirty-second jJause, finished the session. 

To see if familiarity with the task made it easier to Judge the friction 
alone, half the subjects were given the four-choice condition (condition 2) 
first, and half had the two-choice condition first. In each group, half the 
subjects had participated in Experiment 1 and half had not. 
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OQly correct responeee within the epecified time limits were included in 
the analyaia of the resulte. Thus responses that were too long (over one or 
one and a half seconds) or too short (under 100 msec) were counted as 
mistakes. This gave an overall error rate of 

Figure 3 shows the results. Ttie left half shows the results for the 
conditidn in which only the fricative was identified; the right half shows the 
results for thej identification of the whole syllable. Ihe four bars of each 
half show mean/ identification time (collapsed across original and shortened 
frictions), respectively, from left to right, for the syllables 1) in which 
both transition and svowel were matched, 2) thcTse where the transition was 
mismatched but the vowel was matched, 3) those lAxere the transition was 
matched but the vowel was roisraatched , and 4) those syllables where both 
transition and vowel were mismatched . The effect of appropriateness of 
transition, then, is seen in the difference between the first two bars as well 
as the difference between the second two. The effect of appropriateness of 
vowel is seen in the comparison of the first and third bars and of the second 
and fourth bars. - , . 

Across conditions, inappropriate transitions, significantly- slowed identi-' 
fication by 11 msec, P(l , 16)-1 2.97» 2 < -O^- appropriateness of the vowel 

to the friction was even more significant, 2^1 , l6)-52.24, ^ ^ -001, with a 
delay of 20 msec for inappropriateness. The inappropriateness of the vowel 
slowed responses more (by 27 msec to 14) when the transitions were inappropri- 
ate, F(l , 16)-8.01 , 2 ^ difference between the two conditions was 
highly significant, P(l , 16)-1 05* 05, 2 ^ -^^^ Since :this compared a two- 
choice test with a four-choice one, the difference is no surprise. 

The results for shortened versus original frictions, collapsed over 
appropriateness of vowel,, are shown in Figure 4. (The results with the vowel 
mismatched were in accordance with the predictions.) The first two columns of 
each half represent the times for the syllables with the original frictions; 
the next two, those with the shor;tened frictions. The first colim^s of each 
of those' pairs represent the^ syllables with appropriate transitions, the 
second, those with inappropriate transitions. Syllables with shortened fric- 
tions were identified faster than the originals overall by an average of 33 
msec, F(1 , 16)-204.05, 2 ^ -^^^ Still, the speed advantage of the shortened 
stimulT was significantly larger in the whole syllable condition than in the 
fricative condition, P( 1 , 1 6)-60. 04, Jg < .001: The shortened frictions result- 
ed in a 46 msec gain in reaction time \fhen the whole syllable was identified, 
but only 19 msec when the fricative was identified. 

These main resulte conform to the predictions. In the results for the 
identification of the whole syllable, however, there w^s one anomaly. The 
syllables with inappropriate transitions but appropriate vowels were' identi- 
fied faster than the syllables with both transition and vowel appropriate (see 
Figure 3). Ihis did not resuLt in a significant interaction between condition 
and appropriateness of transition, P(l , 16)-1 . 26, n.s. However, the triple 
interaction 6f condition hnd appropriateness of vowel and of transition was 
signficant, i(l , l6)-8.75, ^ < -0^- ^ whole syllable condition, inappro- 
priateness of the- transition slowed identification only if the vowel was 
inappropriate ais^well. This unexpected behavior also contributed to the 
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interaction of appropriateness of vowels and condition, P(l*, 16)-22.92, 2 ^ 
.Ot. The delay for syllables with inappropriate vowele was 50 msec when the 
idiole syllable was identified, compared with 6vly 11 when just the fricative 
was judged. ^ 

A further, set of interactions reveals- that the anomaly is limited to the 
syllables * con ta4.ning shorten^ frictions (see Figure 4)* In the fricative 
condition, inappropriate transitions slowed resppnses both for the original 
and the> shortened frictions. In the syllable conditioif, however, making the 
transitions '^inappropriate^ actually speeded the decision 3 msec with the 
shortened friction; the ^syllables with the original friction showed^ the 
expected patterjnt, ^(1 , 16)?1 1 /55f Jg < •01. Even across oonditions, appropri-* 
ateness of ti<^nsition ^ani shortened friction interacted. \men the transitions 
were appropriate, there was less of an advant^e for having the short friction 
(26 msec compared with 39), P(1 ,J6)«1 5*46, jd < .01'. The same .held for 
appropriateness of the vowel {zZ msec vs. 41"), F(l , 16)=9»35, 2 ^ There 
was a further interaction of condition and appropriateness of vowel and of 
transition with length, ^(1 ,16)«5.71 , 2 ^ ^ there was one group of 

stimul.i, the syllables with shortened frictions and inappropriate transitions, 
that behaved unexpectedly i^en\the whole syllable was identlfie*d. 

Neither prior experience nor order of conditions had a significant effect 
on reaction time, JP(l , l6)-0.29, O.075, respectively, n.s. The interaction was' 
not significant either, j(l , 16)»0. 65, n.s. Thes6 twq variables interacted / 
with the conditions variable, Kl , 16)=7.00, 2 ^ '^^^ No natural explanation ^ 
for the interaction is obvious. Mb re - important is the lack of any interaction 
with the two appropriateness factors. 

Disewssion ^ 

Once again, mismatching the transitions, while it did not change the ^ 
phonetic identity of the fricatiVe^ did slow identification — in this case, of 
both the fricative and the syllable the fricative was in. Mi^atch of the 
vowel and the vowpl that the ft»icative was originally produced with was a more 
significant factor in this experiment tttan. in the previous one. In the four- 
choice condition, this is natural, since the information in the noise could be \ 
a partial cue to the • identity -jof the vowel. Yet even in the two-choice 
condition, where the subject could, in pirinciple, make her decision before she 
even hears the vowel, there is an effect* Further, the mie^atched cues still 
slow thfe identifioation even though \he overriding cue is heard first. 
!Iherefore the results support an "integrating" account and cast doubt on any 
"disposing" accounts (See the General Discussion for a treatment of a 
disposing account with a large time window.) . 

If, in the two-choice condition, subjects were basing their decision 
about the fricative on the noise alone, we mi^ht expect the following three 
patterns to emerge: I) Inappropriateness of transition 'would haye an effect 
only in the four-choice condition, where the subject is required to listen^to 
the- idiole syllable. 2) Similarly, inappropriateness of vowel would have an 
effect only in the four-choice condition. 3) In the two-choice condition^ 
there would be na difference in response times for original and .shortened 
frictions. None of these expectations is fulfilled. Itowever, there is a 
.tendency in the direction of flilfilling the last two, ao' the following 
revision is worth considering: In the two-choice condition, subjects can 
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occasionally succeed in making their decision before the vocalic^^ segment 
reaches them. In those* cases, the jud^gment nould be ^"unaffected" by' the • 
vocalic segment and the above mentioned 'expectations would Hold* When the 
subject is not able to ignore th^ vocalic segment (is "affected" by it), the 
expectations do not hold? the result would be a mixture of responses in which 
the effects of conflicting cdes are weakened in the two-choice condition. 
HowBver, two major pieces of evidence conflict with this' interpretation. 

,Pirst, the transition effect is ebually strong in the two-choice and 
four-choice conditions. That is,',iden/tification is slowed equally' by mia- 
•mtches in .transition idiether the^^ whole syllable is identified or only the 
fricative. If subjects were basing their decision only on the noise, we would 
expect no effect of mismatched tra;isiti>*ns irtieh only the fricative was 
identified. For the transitions to have an effect, thejr must be.he^rd. To Tse 
heard, at least the beginning of the vocalic segment must be heai^. Thusiev^ 
if the vowel itself was ignored, the 50 msec difference in time should have 
shown up, as it did in the whole-syllable conditiion.^ The difference, howeiv^r, 
*waB only 1 9 msec. ^ ^ ^ ^ . 

Second, the higher level interactions show that* the division of fricative 
identifications into "affected" and^" unaffected" responses is not straightfor-;,^- 
ward. The time advantage brought about by shortening the friction is quite 
suggestive: ,In the four-choice condition, the gained speed <46 msec) is 
almost iqual to the cut*in duration (50 msec). For the two-choice condition, . 
the gain is oi^ly two-^fif^hs of that (19 msec). This wDuld lead us to. expect 
that subjects could make their decision on the noise alone approximately 
three-fifths of the time. The discussion of the last paragraph ce^ts doubt on 
this proportion; other interactions involving inappropriateness of vowel do 
the same. If decisions were either "affected" or "unaffected," then mis- 
matched vowel and transition cues would either sloV decisions equally (in the 
affectel Identifications) or i>e^ ignored together '(in the unaffected cases). 
, Thus there shoulid^ be an interaction between appro^iateness of transition with 
condition and interaction between appropriateness of vowel with condition^ but 
no interaction of the three-^ La fact, the .transitioji effect is unaffectfed by . 
condition, th#^ vowel effect is weaker in the identification of just the , 
frigative, and the interaction of all three is significant. The interaction 
of appropriateness of vowel and transition itself goes against any simple 
expl'anation of 'the effects of the mismatch. 

It thus appears that, whatever the explanation of the effect of shorten- 
^ing the friction, subjects are not Ignoring the vocalic segment | in apy <>f 
their judgments. Tljis ig not always the' case, as is shown in" R^pp (l981a). 
In an ext)eriment that tested only identificatl|Ons of the fricativbs [s] and 
[s]. Repp showed that inappropriate transitions did not affect reac|tion time. 
Shortening the noise by 50 msec resulted in a significant reduction in 
reaction time, but the difference was^oniy 8 msec. The subjects in thp 
present experiment may have been more inclined to pay attention to the vocalic , 
segment since half of them par tit ipa ted in the four-choice (identification of* 
idiole syllable) condition before the two-choice (identification of fricative 
only) condition. In addition, some of Repp'^ subjects had recently partici- 
pated in ft*icative discrimination studies, in which they had to concentrate on 
the spectrum of the noise. However, the lack of an effect of vocalic context 
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does not fit Mil ulth the shortened reaction times for shortened frictions, 
eveif if the difference is smaller. . i - \ 

Some .of the interactions might lead to the following proposal: The most 
typical noiss nill -give the fastest time in all ei^ironments. Hepp (l981a) 
also had some <6Vidence that this might be the case for [a]«i The noise of [a] 
is high in frequency, and unrounded vowelp result^ in higher noises for 
^ coartictilsted fricatives. The converse holds for [sl* With the current 
stimuli, the [s] noise from [ss] is the mo.st decidedly is] , and the [S] noise 
from [&u] is the most decidedly We might expect responses^ to those 

noises to .be the shortest. .For the present data, this is^ot the case, even 
idle n the identification of the fricatives alone is considered. Ina^^ead,' the, 
identification seems to be sensitive more to appropriatehess than absolute 
. typicality. * 

Many complicated factors seem to be involved in the perc^tion of these 
modified stimuli. While .the exact nature of these factors would require a 
set^ies of tests manipulating the acoustic structure in a more detailed* 
fashion, the main point i^ clear: Mismatch of cues resiilts ix\ a delay in 
identification. The next experiment will demonstrate this result with sto]^.^ 

EXPERIMENT 3 ^ 

* * 

Stop release bursts are in ^ many ws^ys equivalent to fricative noises. 
They are noises within limited frequencies, * and they provide su^tantiirl 
consonant information and ^me vowel inforaation. !Ihe third experiment of 
this series explores the behavior of mismatched bur sf cues. In this^case the 
two mismatched cues were combined in one element, the burst, so tha^both the 
inappropriate vowel and consonantal informatidn preceded the overriding cues 
in the transitions and the steady-state vocalic section. 

The four-choice condition of the previous experiment, in which the whole 
syllable was identified, was replaced with' one in which only the vowel was 
ide;Eitified. Ufferences between the identificai;4>on« of the consonant and of 
* the vowel would have a better chance of emerging if the different tasks were 
more similar. Also, the subject must still wait for the mismatched cues to 
occur before identifying the vowel, yejb the task Qf \;hoosing between two vowel 
categories is mitoh easier than that of choosing among four syllable catego- 
ries. ' . * 

Experimental Procedure " * 

jiateriaJj * A^male native speaker of Biglish recorded ten tokens of each 
of the syllables [pa], [pu], [kaj, and [ku] on magnetic tape. Ihese were low- 
y/pass ^filtered at 10 kHs and digitised at a sampling rate of 20 kHs. Two 
tokens of each 8yll|^bl?\were chosen, with the requirements that the release 
burst of the stop be 5 mseb in duration. The burst was defined as a segment 
of noise with an amplltud/ rise and fall occurring before the aspirated 
formant transitions. The syllables were either 500 msec in duration (with 
[a]) or 350 msec (with [u]). All the [u]*s were of a much shorter duration, 
and there w^s no pressing ^need to have the stimuli of exeftstljr the same 
duration, so the syllables were not modified* 
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.Once the tokens had bben selected, the bursts were isolated and then 
recccibihed* .with' each vocalic segment. The vocalic fonnant transitions were 
the overriding cu& in all cases for the experimenter. Some ^subjects 
complained of disagreement, especially in the [uj syllables. A non-speeded 
identification of the consonants was added to the experiiiieilt to assess the 
magnitude of the disagreement. 

The mismatched cue, tlie burst, again came before the deciding cue, that 
is, the vocalic ft) rmant transitions. The resulting 64, stimuli fell into four 
categories , similar- to those that were of interest beforet^ l) The infonnation 
in burst matched both the transitions and the vowel; 2) The vowel infonnation 
matched but the stop inforaation conflicted: 3) The stop infonnation matched 
but the vowel inforaation conflicted; 4) Both the vowel and the stop 
inforaation in the burst conflicted with the transilions ^nd vowel of tJie 
syllabla*. , . ^ 

Each session consisted of two conditions f , judging the consonant and 
jiid^ing the vowel. Two blocks of stimuli occurred in each condition. Bach 
block contained 128 trials, plus four "wara-up" stimuli at the beginning 
(idiich Here Hot tallied in the results)* Two 'toljena of each stimulus pccurred 
within each block in randcia order^ The stimuli were recorded on one channel 
of an audiotape while, on the other channel , a timing tone was recorded 
simultaneously with the onspt of . the stimulus. "The inter-stimulus interval 
was 2500 msec. 

Subjects . Two groups of subjects were' tested, expert and rlaive. The 
expert listeners were 10 researchers at Raskins Laboratories, all of whom were 
phonetically trained. Eight had participated in* Experiment 1. Two were left- 
handed. T^ie naive subjects were 10 young adults, all native speakers of 
Biglish ^AiO had volunteered for experiments at Haskins laboratories, and we.re 
Raid for their participation. Nine h^d participated in Experiments 1 ^i^d 2.^ 
Oae was left-handed. 

Apparatus . Subjects were seated in a sound -attenuated booth and heard 
the stiauli over TIM-59 headphones! iSieir responses, were made by pressing one 
of two buttons on a panel in fronj of them. In the consonant condition, the 
"p" response .wa€ on the left and the "k" response on the right. In the vowel 
condition, the "a" response was on- the- left and the "u" response on the right. 
During the test, if the answer was corr^ect and within the stated time limit 
(longer than 100 msec and^ shorter tjaanpone and one half seconds for the 
consonant condition, shbrter than pne seofend foi the vowel condition), a small 
light on the control box in' front of thta lit up. Their response time, 
answer, and the correctness of that ansiier went into *a computer file<arter 
each trial. * 

Procedure ^ ^ ^ ^ - 

The subjects were instructigd to identify the consonant or vowel as 
quickly as possible* They were told to expect a few mistakes, but to^slow 
(jlown if they made' too many. Since ^subjects were not unanimous in their 
Judgment of the st^p. identity, they were told to, expect to dfsagree with the 
feedback in some instances. ^ The feedback light was explained to them. Thirty 
atimuli were run but not scored tc^ give them practice. After it had been 
deterained that there were no questions, two blocks ^re.run^th a thik'ty- 
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second ^use between » folloved 'by a shdrt break. The next ^ond£tl( 
* consie^ing of anather tuo blocks separated by a;thirty-^cond pause, finished 
I the session.- Order of the condftions ims counterbalanced across subjjects. 

After the reaction* time experiments were over, the first block was 
presented a^ain for non-*speeded identification of the consonants. ^ These 
reeiults wie tallied separately from the .speeded identifications. 

Results . 

Only correct. responses within the specified time limits were included in 
the analysis of fhe results. ^ Thus^ responses that were -too long or too short 
(under 100 msec) were aounted as mistakes. This gave an overall error rate of 
4.6JJ. ^ . ^ . 

• 

Figure 3 shows the result ts in a way that ist pflkrallel to the previous 
results. The effect of the appropriateness' of the transition was significant, 
JP(l , 18)"7»$79» 2 ^ average, subjects were 4 msec *Yaa^ter in their 

decision when the transition was &i)propriate. ' The effect was- only* present 
when the consonant was identified This is shown' by the interaction of 
condition wltl^ , appropriateness of transition, F(J , 18)=14.308, p < .01. 
^Inappropriate transitions slowed identification' of the consonants Tcondition 
I'A-hkf^J msec, but sped identification of the vowel by' 3 msec.'- 

Inappropriate vowels did not slow identification significantly, 
JP(r, 18)*1^0B0, n.s., despite a trend of 2 msec in that direction. 
Nisidentif Ications of the stop tnay have obscured this result, so an analysis 
was done' of the data for syllables containing the vowel [aj. The identifica- 
tion ^of the ^ope in thes^ syllables was correct 99*^% of the time fox^ all 
subjects. These results were analysed In the same manner as the full test 
results. Inappropriateness of transitiQn did not have any effect, 
P(1 ,18)«0.40, "n.s., but inappropriateness of vowel did, F(l , 18)-6#99, 2 ^ 
for a delay* of 7 msec. 

f 

The experts were significantly faster ' than"" the naive subjects , 
P(l , t8)-9»067» 2 < •01» The means were 378 and 500* msec, respectively. This 
Tactor was involved in np significant interactions. - ^ 

Results for the non«*speeded identification of the consonants appear in 
Table 1. They are summarised as percentage of misidentlf ications of the 
consonants. Insults are collapsed across consonant and vowel category, and 
are divided in the same manner as the results dismayed in Figure 3* The rate 
of misidentif ication corresponds to increase in reaction time, but it is not 
certain that ambiguity in the stimuli is sufficient to account for the 
results. Four of the subjects accounted for 48.7^ of the misidentif^cations. 
The other sixteen subjects were correct at least 94*5/E of the time. A second 
analysis was done on the 10 subjects with the highest accuracy. There were no 
changes in the vaxlables and interaptions that were significant. However, the 
misidentif ications* still parallel the reaction times (see Table 1). 



I M 



184 



TME8 FOR IDENTIFICATK)N OF MMTIAL STOPS AND FOR VOWELS 



S 

lO 



O 



Transition and Vow«i matclMd 
Transition mlsmatchad .Vowel matched 
Transition matched.Vowel mismatched 
* Transition and Vowef mismatched^ 




1 1 1 1 1 1 1 » 
> 1 1 1 1 » ( t " 
1 1 » 1 1 1 1 1 

' 1 1 » ( 1 1 1 

-111*111 
1 1 1 1 * 1 1 » 

< i ( I I I I » 
X I I i i * ( I ' 
I I I i I M ^ 

I I < I * t * » 

< I I I I M f 
* M * I M " 




IDENTIFICATION 
OF 

STOP ONLY 



IDENTIFICATION 
OF 

VOWEL ONLY 



Figure 5^ Timee to identify the stop or the voxel, Experiment 3. 
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Discussion 



Overall, Inappropriate consonantal infomatlon In the burst slowed reac- 
tion time. This effect, however, did' not appear In the 'results for the 
syllables with [a]. - Overall, making the vowel Informatloxx^ In the stop burst 
Inappropriate to the vowel does not slow Identlflc^atlon of that stop. When 
the results for syllables with [a] are considered alone, however, Inapproprl- 
ateness of vowel does slow reaction time. While these results confirm the 
previous resiiltp for the fricatives to some extent, they must be treated with 
' caution. 

Since the bursts were necessarily chosen for their minimal place Informa- 
tion, their lack of a slowing effect Is not too surprising. The acoustic 
effects of the articulation on the burst are less clear than the effect on 
fricative noise, azid the ktop and transitions Interact In complex ways. The 
sfop can be Identified to some extent from the burst alone (Kewley-Port , 1980; 
Tekiell A Culllnan, 1979; Winlts, Schelb, A Reed, 1971), but .the^e bursts must 
contain less place Infomatlon for this experimental design. 

Vowels can be Identified much better than, chance from the friction of a 
coartlculated ^Icatlve by Itself (Yenl-Komshlan A Soli, 1981). Ihe vowel 
Infomation In release l^ursts Is generally poor, even for bursts of longer 
duration than the ones used here (Culllnan A Tekiell, 1979; Kewley-Port, 
1980). Thus anyudeday caused by Inappropriate vowel Information may actually, 
be due to the burst^s being taken as approj)rlate to a etop not among the' 
choices In the task. 

Although the vowel effect In the stop syllables Is promising, the results 
of this experiment do not provide strong support for the 'notion that 
subcategorlcal mismatcheo slow phonetic Judgments. For this phenomenon to be 
studied with stops, it Is apparent that more^ control over the stimuli Is 
needed, which Is probably available only in synthesis. 
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EXPERIMBHTS 4 AHD 5 

I*n BKperlMents 1 ana 2, foraant transitions have been shown to provide 
Infomatlon about the fricative that cannot be cQpiplj^tely Ignored even ^hen 
that iofomatlon does not detemlne the category Judgment. If the transitions 
were taken to give Infomatloii about a segment* other than the frlcatlve-f 
hovever, ire vould expect them not to affect the speed with irtilch the fricative 
is identified. One way to make the transitions "affiliate" Id th another phone 
.•is to insert silence artificially between the friction and the vocalic segment 
(cf. Best, Morronglello, 4 Robeon, 1981; Mann 4 Repp, 1980). With a suffi- 
cient amount af^jsilence preceding, transitions can be perceived as stops in 
fricative-stop casters. 

When 60 msec of silence was Introduced between, the friction and the first 
pitch pulse of the fricative- vowel syllables from Experiment 2, stop percepts 
resulted in about half the cases. Generally, the [ &] transitions yielded 
s^ops, while the [s] transitions were usually perceived as an Interdental 
fricative [ol* The unexpectedness of this reefult led to a reexamination of 
, the particular stimuli uss^ As seen in Figure 6, there is a portion of the 
noise Just before the onset of voicing that is much Jlower fn amplitude than 
the rest of the friction (as seen in the waveform), and that Ixas recoglil^ble 
traces of foraant treuisltlons^ (as seen in the spectrogram). This token of 
[*a] is typical of the eight syllables used in Experiment 2. Although, the 
first pitch pulse has been used as a demarcation between fricative and vowel 
(including transition) in previous experiments, the ^transition's need not begin 
with voicing. When the fricative gesture ends and the vowek gesture begins^ 
there can be a brief period when the, tongue Is not close enough- to the roof of 
the laouth to produce real frltftion but voicing has not started. What, results 
then is essentliKlly aspiration. This asplr&tlon can be seen as part of the 
transitions. Just as it. is with voiceless stops. 

Wh6n these observations are taken into account, it is clear that there is 
just *as much Just^ification /Tor Hreatlng the "aspiration" as part of the 
transitions as for excluding it. If ^he onset of voicing defines a point that 
excludes some of the transition, it is not as surprising that introducing 
silence at that PQlnt will not alyays result in the perception of a stop. The 
"aspiration" deserves to go with^the vocalic segment as w^l. In factr when 
an appropriate amount of silence is Introduced 10 msec before the onset of 
voicing (thus including a portion of aspirated transitions with the vocalic, 
portion)., stop percepts result with all the syllables of E^cperiment 2.^ 
Stimuli with 60 msec of silence inserted 10 msec before the first pitch pulse 
were then chosen for an experiment to* detemlne idiether the differing 
traMltions slowed identification even irtien they ^ affiliated with another 
jphone, in this case, a stop. To Justify the original result, however, the nfw 
location had to be tested in the original paradigm. Experiment 4f therefore, 
is a replication of Experiment 2, and Experiment 5 tests the theory that the 
transj^tion effect will disappear^ when the trandiittons rfflliate with a 
different phone. ' >^ ... ^ 

EXPERIMBHT 4 ^ * ' , 

The four-choice condition of Experiment 2, in lAiich the wViole syllable 
was identified, was again replaced with on^ in wfilch only tHfe vowel wfas 
' identlftad. * In addition to the reason^ for thj9 revised procedure given above 
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Figure 6* Illustration of the low-amplitude, voiceless transitions, frdfci the 
syllable [Sa]. 
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tor &i»ria«tot 3, there ms the added, necessity of catparing Btperiments 4 and 
5. Since jthe sjllables of Btperl«ent 5 consiat of three phonee, it would be 
difficult for the aubjecte to identify only the first and third. 

Bxperlaentiil Procadura j 

^ MatarialB.*' Bia ayllabXes [aa], ["ha], [au] a^d [%\x] from Experiment 2 
Hare vmnd. Tha ahortenad varaiona of tha frictions nara not used. Ihus there 
vera gl^ht fricative and eight vocalic aegmenta (since tuo tokens of each type 
vara utfed)% vlth tha difference being that the vocalic segments now contained 
teB^^aaec of ydc^laaa tranaitiona, and the frictioos wore correspondingly 
shor^r. Again, each friction waa combindd with each vocalic segment, . 
incluJing the one it was originally produced with. This resulted in 64 unique 
.stimuli, coaprising ,the same groups of interest: 1> both transitions aM^ 
vowal ^lity ware appropriate; 2) transitions were appropriate but voiif%y 
quality was mismatched; 5) vowal quality was matched but transitions were not; 
and 4) both transitions and vowel quality were inappropriate. 

/Procedure. Bac\ seaaion conaiated of two cDnditiona. In one, aubjects 

* identified tha fricative aa quickly as poaaible; in the other, they identified 
the vowBl. An unscored practice bjock of thirty atimuli was given before each 
condition. Ea^h conditioa consisted of two blocks ae^jar^ted by a thirty- 
aacond pauaa. Tha orter of the conditions was counterbalanced across aub- 
Jacta% Tha ganerak procedure ma the aame Ejcperiraent 2.^ In the fricative 
condition, tha •*a"waponae button was on the left and th^ "sh" on the right. 
In tha v,o\ttl condi^on, tha "a" button -was. on the left an^ tha V on the 
right. * • * ^ 

Subjecte . Two groupa of subjecta ware tested, expert, and naive. Tho 
expert liatanera war/9 10' reaaarchers at Haskins laboratories, ojl of whom were 
♦ithar phonetically trained and/or had experience in phonetic research. One 

* was left-handed. TSie naive subjects ware volunteers who were paid for their 
participation. Nona waa left-handod. 

^ Raaulta 

The error rata wtfs A*3% overall. Answers longer than one feecond (in both 
conditions) were counted as errora. 

> K Figure 7 ahowo the results in the same m'^nner as before. Inappropriate 

X- transitlona resulted in a significant 6 msec delay, P(l , 18)-23*35f 2 < -01 . 
V Inappropriate vowels caused a 12 msec delay, P(l , 18)-28.43f 2 ^ ^These 
two factdra ware again independent, P(l , 18)"1 .85f' n.a. 

Identification of the fricative waa faatpr than that of the vowel l?y an 
average of 68 maac, P(l , 18)-r9.82, 2 < -O^- Blowing effect of inappropri- 

ate tranHtiona waa the same whether the vowal or the fricative was identi- 
flad, P(1 , 18)-0.03, n.a.. The vowal effect, on the other hand, was smaller 
i^an tKe fricative had to be Identified^, P(1,18H.66, 2 < -05. 

Tha expert subjects ware 47 msec faster than the naive subjects (435 
vs.. 482 overall mean), but this difference waa not significant, P( 1 , 18)-2. 382f 
n.s. Hone of the interactions with the ex pert/ naive* factor^ was significant. ' 
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Figure 7. Times to identify the fricative or the vowel ^ Ebcperiment 4. 
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Figure 8. Times to identify the fricativb Or the vowel, Experiment 5. 
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Discuaslon 

As before, miaoi ate h|.i}g the transitions, while it did not^ change the 
phonetic identity of the fricative, did slow identification. <^ this case, 
the identification "iiNi eJLther of the fricative or just of the vowel. -> Thd 

' de],ay caused by the inappropriateness of the vowel was again larger than that 
caused by inappropriate transitions ('12 msec vs. 6 msec). However, the 

"I transition effect was mox^ reliable. .Also, the transition effect does not 

weaken in the two-choice condition. That is, identification is slowed equally 

by mismatches in ti*ansition idiether the vowel is identified or only the 

fricative, as we would expect from. Experiment 2» 

< 

Some of the finer details of this experiment aid Ebcperimsnt 2 do not 
match, but the pv^r^l picture is clear. Inappropriateness of transition 
leads to a delay in phonetic identification of both the fricative and the 
vowel; inappro^iateness of vowel gives a similar, somewhat larger delay 
Theae two effects are independent. The next experiment explores tfie effect Cf 
the transitions idien they affiliate with a phone other than the fricatives 
they were originally produced with. 

EXPERIMENT 5 

Experimental Procedure 

Materials . The syllables [sa], [^a], [su] and [su] from Erperiment 4 
were uaned, but with'60 msec of silence inserted between the friction and the 
vocalic segment. This gave rise to a stop percept in all combinations. The 
procedure was otherwise the same, as for Experiment 4» 

Subjects . The subjects of Experiment 4 participated. 

Procedure. The procedure of Experiment 4 was used. 

Results 

The error rate was 4*1^ overall. An^^iers longer than one second (in both 
conditions) were counted as errors. 

Figure 8 shows the results in the same fashion as the previous experi- 
ments. Inappropriate transitions resulted in a significant 4 msec delay, 
P(l , 18)»5.41 , jj < •r05. Inappropriate vowels caused a 17 msec delay, 
2(lf18)«81.99, jg < .01. As before, the slowing effect of inappropriate 
transitioi^^^was the same idiether the vowel or the fricative was identified, 
F(l , 18)*0.3ff; n.s. This time, however, the vowel effect was also the sams in 
both conditions, P(1 ^ 18)«2.25t J^^s. 

Subjects were significantly slower (by 124 msec) in identifying vowels 
than fricatives, P(l , 18)«39.69f ' Jg < .01. Note that this is almost exactly 60 
msec more, than the difference in Experiment 4 (without the 60 msec of 
silencje) * * 

The expert subjects were again faster (this time by 47 msec) than the 
naive subjects (484 vs. 531 overall mean), but this difference was not 
significant, F(l , 18)«2.37f n.s. There were no interactions with this factor. 



O 109 

ERIC 



SabcittKoriomI Rionetic ttLaiatchss Slov Rionetic Judgments 

• * « 

An analTBis thmt pm pared Bcpe^l»ent8 4 and 5 wae run. This revealed 
three interactions of interest. First, responses nere slower to the syllables 
with inserted sUence (459 vs. 507 msec), r(l , 18)-23. 26, 2 < -O^- '^^^^ 
4ue largely to the vonel identification, F(l , 18)-23* 26, jg < .01 (see Table 2). 
Since the syllables in EsperiAent 5 ware 60 msec longer than those of 
Bsperinent 4, it is natural that the vowel judgments should be slower by 
approximately that much. The consonant judgments were also slower in Ehcperi- 
'ment 5. A separate analysis of variance of just the fricative identifioa- 
tions, however, shows that this difference is not significant, jP(l , 18)=1 .81 , 
n.s. This indicates that, lAile the listener is* waiting long enough to 
integrate the •information of the vocalic segment into the fripative percept, 
she do^s not need to wait for the syllable to finish before she makes her 
judgment. 



Table 2 r 

• Mean Reaction Times (iST msec) for Identification of Fricative vs. Vowel 
Occurring in Experiments 4 and 5* 

fricative » vowel 

Exp & ' 425 493 

Exp 7 ' 445 569 - ^ 



The prediction that the effect of inappropriate transftioni^i would be 
greatly reduced is fulfilled! Khile the absolute duration is/not much shorter 
(4 msec vs. 6 msec), the transition effect is much less reliable in Experirifent 
5f F(1,18) of 5.41 f6r Experiment 5 vs. 23.35 for Experiment 4* It might seem 
"that this is the result merely of physical separation of the two cues 
(friction and transition). With the same separation, however, the^ vowel 
effect strengthened, both in duration of the delay and its reliability: 12 
msec, F(1,18) of 28.43 for Experiment 4 vs. 17 msec, F(l,18) of 82.00, for 
Experiment 5* 

Discussion 

Inserting silence between the friction and the vocalic segment so that a 
stop was perceived did not change the perceived phonetici category of the 
fricative. Nonetheless, the mismatch of transitions did slow the subjects 
someidiat. The delay caused by the inipprblpriateness of the vow^ was again 
larger than, that caused by inappropriate transitions (17 msec vs. 4 msec). In 
this instance, the vowel effect was much more reliable. 
». — 

The tDansitions of [u] vocalic segments did not significantly affect 
reaction time, while the effect of inappropriate vowel quality was much 
greater for [u] than for [a]. Neither pattern showed up in E<|)eriment 4. 
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Sinct th© transitions far hlfh vonsla are lihorter than those for low vowels,^ 
it could be that only the transitions of low vowels give inforaation about a 
preceding fricative. The previously noted effect of sto p- affiliated trj^nsi- 
tionaon ftricative percepts (Repp 4 Mann, 1961) used only the vowel [a]. 

Even with added, silence and a new (stop) percept, tHe general pattern 
established in the previous experiments renains; Inappropriateness of transi- 
tion (in the one case idiere such an effect had . been shown in perceptual 
s.tudies previously) ^ leads to a delay in phonetic identification of bothYthe , 
•ftricative and the vowels inappropriateness Qf vowel gives a sijililar., som^at 
larger delay. These two effects are independent. 

The prediction that the ^effect of inappropriate transitions wo\ild. be ^ 
greatly reduced by the insertion of silence (Experiment 5) is fulfilled. 
Uhile the absolute duration of tjie delay caused by inappropriate transitions 
is not much shorter (4 msec vs. 6 msec), the transition effect is much Ifess 
reliable in. Escperiment 5, P^l ,18)-5.41, than in Experiment 4, ' P(l , 18)«23. 35- 
It miglit seem that this is the result merely of physical separation of the two 
cues (ftriction and transition). With the same separation, however,' the Vowel 
effect strengthened, both in duration of^the delay caused by inappropriateness 
and the reliability of the effect: 12 msec, P(l,18) of 28.43, for Experiment 
4 vs. 17 msec, ^(1,1 8)^ of 82.00, for Experiment 5. 

' " . , 

GEHERAL DISCUSSION AND GOUCLUSIOH 

The five experiments described in this paper provide convincing evidence 
that listeners take cues into* account even idien those cues seem both 
superfluous and ineffective. The voirel information in fricative noises and \ 
stop bursts and the consonant information in vocalic fomaht transitions both 
are generally too neak to do more than cause subcategorical variation, yet 
reliably slow down identifications if they 'are inappropriate * This slowing 
occurs whether the information pertains to the particular phone being identi- 
fied, or to the phone that just happens to be presented at the same time. And 
finally, the mismatches cauap just a? much delay whether they precede or 
follow the overriding cue. 

This last result is further evidence that listehers do not interpret the 
speech stream in a strictly left to right fashion. Other evidence to that 
effect has been found. F6r example, Repp,^ Idbeman, Bccardt, and Pesetsky 
(1978) found that a stretch of Silence was or was not treated as a cue to stop 
manner depending on the phonetic judgment made on the next segment. Miller 
and Idberaan (t979) and Bliller (I96l) found that speaking rate, as determined 
by length of a following vowel, influenced the [b]-[w] boundary. Both these 
and other instances of later information affecting an earlier boundary involve 
timing. Various "disposing" theories (e.g., KLatt, 1979) have incorporated 
methods of withholding certain phonetic judgments until length information has 
been fifathered. However, the present judgments do not depend on duration. In 
the fricative- vowel syllables, the place of the fricative is completely 
determined by the noise. Length could, in some cases, determine voicing. But 
there is no apparent reason for waiting until after the transitions have been 
processed to make the place decision. Thus the speech mechaniem seems to 
integrate all cuea available not only across the frequency range, but also 
across the time and frequency ranges together. 
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It Bight appear that the difference /between the integrating and diep^ing 
accounts ia the aiae of the time frame for analysis.' This is not the case. 
The primary ^iatinction ia that diaposing accounta ni'sh to treat each time 
slice as a single (auditory) event and to extract all infonoaticm from just 
its "gross spectral ehape- (Hlumstein, Isaaca, 4 Mertus, 1982; Stevens, 1980). 
A disposing- theory vith* a .larg« time window would thus need to extract, for 
example, both the stop consonant and the vowel *trm oao spectral shape. If, 
on the other hand, the temporal window is increased but more than one spectral 
analysia ia done, then the two theories would be indistinguishable. 

• Listeners do accunulate infomation about phones during the receptix)n of 
the speech signal. It is posbibl^e for them, in the proper paradigm, to make 
deciifions of fricative identity based on the noise alone (Repp, 1981a). *The 
accumulation of cues, then, is continuous, even if adjustments to their values 
at'e made in response to later cues. Mthet^^^he idiole signatl must be processed, 
as in the identification of the vowel in the present experiments, the 
integration of cues seems to take place consistently. 

The present results do not tell us very specifically Just how long a 
liatener waita before she reaches a decision. Recent work by IJartin and 
Bunnell (1982) shows* that vowel- to- vowel coarticulation, manipulated in much 
the same way as the present stimuli, holds across Intervening ^ consonants. 
Thus the syllable is not tlie absolute limit to the suFca teg orical/ matching' 
^process. A transient cue Like a set of foment transitions, though, may be, 
more tightly bound to the syllable in i*ich^ it occurs. Only further 
experimentation will decide th^ issue. ^ 

Ihe delays in ideatification 'due to phonetic mismatches are small but 
highly reliable. Diis suggests* that subjects are not overly concerned that 
one or two minor variations are introduced, but must still take the*' time to 
integrate the cuea processed. But consider the problem with synthetic; speech. 
Unlike natural speech, which' has almost everything righ(1;, sj/n the tic speech Ti^s 
Just barely enough right to* be understood. Even "fully" iittteHl8ibl|. syn- 
thesis may impose an unacceptable processing loQd for general usefulness. 
Those features that make a synthesiaed syllable Just a bit harder to process 
(for example, getting the transitions slightly wrong after fricatives) may not 
be apparent even to the most critical listener. ,Yet the small delays may be 
adding up, requiring more time to be spent on phonetic processing, and leaving 
less time for semantic processing. If synthetic speech is to be listened to 
for long periods with the intention ' of getting, the ^ontent straight, the 
synthesis must be more than interpr^table. It must be accurate in rays that 
the person doing the synthesis cannot hear directly (cf. Hye 4 Gaitenby, 1^3; 
Pisoni, 1982). • • / . ^ 

Finally, it should be noted that the proposed at'tempt by the liste*ne;: to 
jltke sense of all she hears does not cdntradict the evitience that she can 
restore parts of the signal that are missing (Samuel, 1981; Warren, 1970). 
There is a difference between a lack of information and the presence of 
conflicting- infomation. k demonstration of Just that distinction in the 
present paradigm is being planiied. But for naw, we still have rfurth^r 
evidence that the listener knows lihat a possible articulation is and attempts 
to integrate all cues in- the construction of her phtfnetic percept. 
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TGVABD A DYI^pAL ACCOUIT OP MOTOR M«ORY AHS COHT.ROL* 
Elliot L. Saltfltan and 4* A. Scott Kelsot 



/ 1 . IHTRODUCraOS 

Recent approaches to problems of complex, coordinated movement have 
emphasise^ that motcyr control arises from the task-specific dynamic system 
defined in a given actor->environmeht context* Ve suspect that motor learning 
and motor memory phenomena are likewise grounded in movement dynamics. Hence, 
a reforaulation of certain memory and learning problais with reference to 
dynamic principles is ilmclertakdn here as a necessary first step. In the 
folloving sections' w© nill: a) offer a constructively critical overview of 
several assumptions evident in current work on motor memory; b) attempt to 
sketch out a generalised type of dynamics that might lead to a unified 
approj^rir to problems in sensorimotor control, learning, and memory; and c) 
offef^ja brief and speculative refoimulation of questions relating to short 
term motor memory lAienomena. ^ 

2. IPTOR IBIOiHf^AHD COHTROL ; 
CRITICAIk remarks OH' 30MB QUE3TI0HABLE ASSUMPTIONS 

Considerable empirical advances have be^n made in the areas of motor 
I memory and control in the last decade, y/t we perceive some undercurrents 
among our colleagues to the effect that progress has become sfun ted, particu- 
larly in the memory field.1 This may be a general trend, arising, from the 
realisation that much' more attention needs to be paid in the first j^lace to 
the information being detected and used in the functional^ context of sensori- 
motor tasks, before we can ascertain anything about the nature of memory 
processes themselves. Even the standard metaphors of the memory theorist-- 
•such as storage and retrieval — have been seriously questioned (e-g-t Bates, 
1980). To be sure, something changes as animals behave adaptively with 
respect to their enviro^^nts, and' that sometiiing allows new perfonnances to 
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occur and old. ones to be Improved upon. But what chaYig;e8? And why does such 
change persist? ^ ' * i 

Convention .has it that what changes is some yiing or accumulation of^"^^ 
things in the animal itself— -an assumption that may be only partially correct. 
This assumption has been sufficiently enticing, however, to lead the biochem-r >^ 
ist and the neuroscientist to seek structural changes in so-c^>lred "simple" 
organisms as a function of various conditioning regimes (cf/^Kandel, 1976; ^ 
Thompson, 1976). The physix>chemical basis of the "engram" is a hotly pursued^ 
topic of research that is laden with hidden assumptions, a primary one being 
that engrams exist* to begin with. One can readily see some x)f the problems 
here; even in species with low numbers of neuron^. It has not been possible to ^ 
relate neuronal patterns to behavior isomorphically (cf. Selverston,' 1980). 
"Context" continues to plague and puBsle ud. Even the ethological concept of 
"filled action pattern"- as a. behavioral counterpart to a unique set of neural 
events is under heavy fire at the moment in studies using the v?ry organisms 
that Lorena used to establish the idea. Bellman (I979)f for example^ has 
shown that the^icard ( sceloporus ) does not reg^ji^e competition between two 
behaviors (^•g*f aggression and eating) by choosing one and suppressing 
others. Hathfr, the lieard's response to conflict is rich and varied. In 
what she calls "merging" ( to contrast with a single type of competition 
resolution), elementgf of both #lfeaviors are seen, as reflected posturally ih 
limb ^configurations, temporally in the movements themselves, and spatially in 
overall orientation. These observations suggest strongly that fixed units of 
behavior are not selected as a whole In immutable form. The consequences are 
obvious for a theory of engrams that are isomorphically related to specific 
behavioral patterns. 2^ ' 

In the, realm of psychology, few find it appealing to propose individual 
memorial counterparts for every possible behavioral variation. All neverthe- 
,lesa assume that something is stored , that inforaation is somehow accumulated , 
that skills and habits are things tfiat are acquired . In this style of 
thought, representations exist iinder^ a number of various guises—templates, 
perceptual traces, internal models, schemas, generalised motor programs, and 
the like. Our intent here is not to commence a diatribe against represents- . 
tioni^lism (but see Fodor A Pylyshin, 1981; Tuiwey, Shaw, Reed', 4 Mace, 1981, 
for a lively debate). . Rather, we^ would like to raise some questions about 
certain assumptions that seem implicit in current approaches to motor memory 
and control in ordeir to suggest alternative styles of inquiry to those that 
presently ^redomina tie. > ^ * 

Often the way we ask questions determines what solutions we expect. 
Perhaps asking* the queetion differently or changing its focus- will allow, if 
ndt new insights, then at least an elaboration of perspectives that can be 
differentiated. We thdnk that it can be argued Justifiably that current 
approaches to memory and control are dominated by certain singular themes (or 
styles of inquiry) that most have agreed On. Differences in perspective are 
nested within the same style of inquiry; they may be more a product of the 
manipulations that people perform in their experiments than any fundamental 
difference in outlook. If correct, this intuition suggests a reason for our 
stymied progress. Rather than variations on a theme, perhaps wd need 
contrasting themes (cf. Kelso, 1981; Kugler, Kelso, 4 Turvey, 1982). Q^a^of^' 
the aims of this paper, following recent theoretical and empirical Work>Dn 
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coBpfta, functionally defined coo rdinat€jg activities, is to^ promote dynamical 
principles on vhich«to ground an understanding of motor memory and control. 
v4 vill attempt to sketch out a generalised type of abstractly defined 
dynamics that may provide a depaVtCire point toward solving certain long-- 
etanding problems in t^he memory and control area« But, since our role here is 
to provoke and perturb, let us first do some confeciousness-raising on the. 
status of vbat ve perceive to be the status quo > 

2. 1 -^Assumption J^: Skill Development as the Accumulation or Construction of 
Cognitive Representations 

The acquisition of skill is difficult to understand , acccording t^ 
Assumption 1 , without tissuming that practice allows us lip store a large nun^ber 
of movement patterns, or, more correctly some say, the perceived consequences 
of our actions. Whether we abs^act* out the 'key features of how the movements 
were^ produced and call it a schraa or generalised motor^ program is not really 
the issue. Uie issue is the universal agreement that we accumulate, absttact, 
or construct something that is stored centrally as a memory or knowledge 
structure. For example, a common view of skills such as boxing that demand 
fast reactions of the performer is that pedple: 

"...use cues in the situation to tell what will probably happen 
next: They anticipate. This constitutes -a cognitive skill . 
(Italics ours) Redundancy inherent in the situation is stored in 
memory* The skilled person has quick access to that knowledge 
structure that allows prediction and anticipation." (Keele, 1982, 
p. 157) 1 

And, further analogising from research on cognitive skills such as chess, 
K^ele (1962) offers the idea that skill depends "...largely on extended 
practice, involving thous^ds of hours. In that time people accumulate a 
•vocabulary' of thousands of patterns (or situations) that they can recogniz^, 
and they build an extensive repertoire of strategies and responses to deal 
with those patterns" (p. 139). 

• 4 

To be fair to Keele, these ideas are advanced as "quite speculative" and 
liypothetical. flowever, they »ce not at all unusual in the area of imotor 
skills. Most would offer little argument and there is certainly a gitewing 
consensus that motor skills have a heavy cognitive component (at least j 
initially), and that action sequences are centrally represented even in the 
highly skilled, ^t it might be a mistake to place skilled behavior in the 
cognitive domain — at least perceptual-motor ones like boxing. And it might l^e 
a mistake to assume th^t the brain or mind contains remnants of our 
experiences — cognitive and otherwise. An alternative*^ to this accumulative or 
constructive view of skill acquisition is one Ijjiat does not appeal to 
cognitive operations to make sense of incoming stimuli, but that rather 
suggests that the information being picked up becomes more and more precise 
and subtle as skill develops. This view argues that the skilled perforaer 
becomes attuned to increasingly subtle perceptual information as a function of 
expedience (cf. Gibson, 1966, 1979). The contrasting perspectives afforded by 
the ac^^umulation/ construction versus attunement approaches represent entirely 
different theoretical accounts for the simple fact that experience changes the 
animal (Michaels A Carello, 1981). According to the latter alternative we do 
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*not bacoBe skilled by increasing the niaber or conplepcity of memories (or 
imowl edge, structures) in the aniaal's brain; rather, ne discover and become 
sensitive to, i>e,> resonate to (cf. Gibson, 1966, 1979, and commentaries by 

"^Macet Euneson, and Grossberg on Ullman, 1980) increasingly complex and 
differentiated information structures realised by events defined over the 
actor and environment. In Buneson*s (1977) terns, we become increasingly 
smarter special purpose devices, 3 attuned to complex information that is 
alvaya available for detection in terms that are uQiqttft and specific to the 
acts that amlmals perform. Prediction and anticipation are consequences of 
this characterisation, i.e., information is specific to what can be done 
(prediction) and when it can be done (anticipation). Our ability to use such 
information is exquisite. Two examples will illustrate these points. 
•* -tt 

T0dd^(l9ei) has considered the outfielder's problem of trying to catch a 
fly ball in terms of the visual information currently available that specifies 
whether the ball will land behind or in front of the fielder's present 
position. Todd identified several sources of such "predictive" information 
and demonstrated , using animated computer displays, that subjects could detect 
and use such information in perceptual iudgments. I(i fact, it appeared that 
subjects were sensitive to information specified in the following relation 
between optic and physical variables, in irtiich optic variables refer to the 
^projection of the physical event onto a two dimensional planar surface: 

• -Ay/2R > VY' xVR7(R')2 (l ) 

where AY • physical vertical acceleration of gravity, 

H - physical diameter of ball, 

VY' - optic vertical ball velocity, 

VH' - optic ball dilation velocity, 4 
R* - optic belli diameter* 

Vhen equation (l) is satisfied, the ball will lanH in front of the, observer. 
Bote that the visual information specifying final landing point relative to 
the observer is available throughout the ball's trajectory. In otheir words, 
th#^information available at a given point in time is "predictive" in that it 
specifies a task-relevant spatial relationship that will occur subsequent to 
that point in time. Bote that for this relation to be useful, the oTj^erver 
must be sensitive to (and prestmieO^ly must discover) the critical ratio, AY/2R, 
between the acceleration due to grai/lty and ball sice. P5|psumably, the 
observer's sensorimotor system is posturally famillOT with the gravity vector; 
however, information specifying the ball else and hence the critical rati6 
obviously depends on the specific ball-skill context (i.e., baseball, soft- 
ball, basketball, etc.). 

The second example of intrinsically predictive visual information is due 
to Lee (1976), who identified the optic invariant specifying the tlme-to- 
contact of an object approaching an observer (or vice versa) at a constant 
velocity along the line of sight. This Information is specified by: 
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ylAr- ' . ' ^ . (2) 

of dilation o^ the retinal image of the object. When the 
ving a car and approaching a atationary obatacle, auch' 
fiea tiae-to-collialon. In thie context, Lee deacribed time- 
to-cplliaion aar^n ' valuaa at vhich the driver would hav^to atart decelerat- 
ing nith a given braking power when traveling at a given^curfent velocity in 
o^er to atop ahort of the o^atacle (aaauming ateering controla are ignored). 
With reference to problema of coo^inated moveoient, we should point out (in 
the apisrit of Warren A Shaw's (l9o1) diacuasion) that such margin val,ues may 
be used to scTale apatibtemporal perceptual information to the power-generating 
..capacitiea of the' actor in given taak situation. For example, there exists 
a margin T^ue for the which one can initiate a successful Jump when 

running toward a jumpable Ipii^^ a given speed. This time-to- jump margin 

value will vary acroaa oi^ll^^iia^^^ w^^ different power to body-masa ratios, 
i.e., organiama wlt^ greap^:^^^ ratios can initiate successful jumps 

at aaaller margin values* 

2.2 Aaaumption 2: Genefal Ptfl^ae Processes and Devicea 

Thoae of ua Who were in graduate couraea in paycbology -oj^l^riiing in the 
19608 and 1970a were likely impreased by the enormoua Ifpt^ll^ts of our 
predeceaaora to provide a general theory of learning. Thia waa truly an 
admirable goal and moat of ua would atill like to believe that a small set of 
general principlea underliea all fonts of learning. A claim that haa recently 
been made (Johnaton, 1961) ia that auch general principlea should be sought in 
the relationabipa between animala and their natural envii%timenta . This 
ecological approach contrasts with vprevious "geaeral proceas'* efforts that 
have reatricted their atudiea to defining the characteristics of . aaimala 
themaelvea./ For example, a taeitt aaaumption of the latter type of approach 
waa what Sbligman (1970) called the '*equi\f)aenoe of aasociability" aaaumption, 
that it ma equally poaaible to learn any relationahip between stimulus and 
reaponae. Much recent work, however, hae ahown that there are biological 
conatraints on what can l^e learned (e.g., Colles, 1972). Animala do not 
operate in univeraal contetta; they are not general-pur poae maohlnea. The 
elegant conditioning experiments of Garcia and colleaguea utteat to this claim 
(e.g., Garcia, 1981 ; Garcia 4 Koellin«, 1966, for review). Briefly, Garcia' 
ahowed that rats oan learn to avoid aweet- tasting water when it ia paired with 
toxicoaia, but not if it ia imired with foot-shock. Moreover, in the former 
caaa the pairing doea not have to be temporally contiguoua; delaying the^ 
noxious US' (unconditional stimulus) un to two hours still resulted in learning 
to avoid the sweet- tasting water (CS). All of thia evidence (and much more, 
.see Johnaton, 1981) contravenea the principle of equivalence of aaeociability 
and strongly auppofta the view that those activitiea that are part of the 
animal' a natural habitat or niche can be learned easily while othera cannot. 

The biological-conatrainte perspective appears to have had no visible 
impact in the motor behavior literature (irtiere it should be moat relevant). 
For example, it was totally ignored in a recent meeting on motor memory and 
learning ( Worth American Society for the Paychology of Sport and Phyaical 
Activity , Asiloaar, CA, 198iT The area of motor memory, borrowing heavily 
from the verbal learning area, continues to deal with "iteme of information" 
or •'itema to be remembered" aa Ita relevant atimuli. In fact, the more novel 
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and arbitrary the "item" to the activities that people perform--80 the 
argument goes — the better we are able to understand hew new ** items" are 
learned and remembered. This view of movements as "items" is a vestige of 
asaociationiaaj in fact it is associationism (cf. Jenkins, ^9^9)• It assumes 
that perception, learning^ and memory are general- purpose processes; it 
assumes that anything that will produce an effect constitutes a stimxilus item; 
it evokecL descriptions of the information base that are animal-neutral (hence ^ 
"items"); it rejects the claim- -supported by iauch recent work— that behavior 
is constraijfied by particular aspects of environmental structure to which an 
animal is sejasitive. According to Assumption 2, then, movements are learned, 
controlled, and remembered by general purpose devices that process movement 
information in the same manner regardless of the functional or task context. 
It should be noted that this assumption is evident not only in human motor 
control and memory research, but also in the field of robotics. Thus, for 
example, it has ^een generally assumed that robot limbs can perfoim different 
tasks according to the same general purpose planning and control operations, 
e.g., joint velocity planning and^servoing for both manipxilator arms (e.g., 
Wiitney, 1972) and hexapod walker legs (e.g., McGhee 4 Iswandhi, 1979)* 

. In contrast^ with the general purpose approach, we wish to ajrgue that 
mdtbr learning, memory, and control processes arp not neutral to an ac*^on' s 
functional <Jr task context. In this, regard, one assertive claim to be ti^e 
here is that we should reject "items" as constituting the what of, memory, just 
as WB shoul4 reject "muscles" (admittedly less arbitrary to the control of 
activity vthain "items" are to memory) as the what of control and coordination 
(cf. Kelso 4 Saltman, in press). Instead, we shoxild give a good deal of 
thought to the types of tasks organisms (including humans) perform, in 
recognition of the fact that tasks that meet existing constraints are easier 
to perfoim th€m others that do not. Consequently, any natural informational 
units that may be relevant to understanding that idiich we call memory and 
control need be defined functionally ; that is, with respect to the tasks that 
animal's can perfoim. General purpose theories of control emd memory^ are too 
powerful in this regard, because they offer viable accounts of phenomena that 
never occur natur^ly as ifeli as those that do. They fail to acknowledge that 
evolution and developnent play an economising role by restricting the' tyi)es of 
activity that creatures perform to those that are behavio rally useful . 

Ve have invested a good deal of effort in identifying what we believe to 
be significant units of control. These are not individual degrees of freedom 
of the system like muscles, or preestablished arrangements between receptor 
and effector elements (the reflexes that Sherrington (1906) referred to as 
"likely fiqtions"), or prescribed arrangements among instructions (central 
programs, etc.). Bather, they are functionally specified ensembles of muscles 
and joints that act as coherent units during task performances and whos6 
component elements vary autonomously in a mutually constrained manner (e.g., 
Boylls, 1975; Powler, 1977; Greene, 1971 » Note 1; Kelso, Southard, 4 Goodman, 
1979; Saltman, 1979; Turvey, Shaw, 4 Mace, 1978). We shall have much more to 
say ab<|fut the organisation of these action units as discussion proceeds. 

s 

2.3. Assumption 3: Cues and Features > ^ c 

An extension of the "movement as a to-be- remembered item" approach is to 
partial up the movement and identify the various "features" or "cues" that 



Itovmrd a Ibmaaical Account of Motor Memory and Control 



could be coded by a subject in a reproduction task. 4 Thus the problem for 
motor memory becomes one of identifying which cues are "codable" and which are 
not. The prototypical case is the distinction between distance and location 
cuea-->an isaue that on its own must have provoked thinty or forty articles* 
If one accepts that these aspects of movement can in fact be differentiated, 
the result is that location reproduction is better than distance* Numerous 
accounts have been offered^ for this finding* Many of the early studies (and 
many of the later bnes as well) argued that location is more effectively 
reproduced because there are kinesthetic receptors for joint position but not 
for distance (but see Kelso, Holt, 4, Piatt, 1980), or that distance is less 
directly coded because it requires an interpolation of velocity. Another type 
of interpretation followed Lashley* s ideja of a space coordinate system. Limb 
positions were thought to be more easily coded than distance because they were 
referred to an internal representation of spatial coordinates rather than 
being kinestheticaljy determined. Thus, identical spatial positions could be 
reproduced with either limb (as long as direction of movement remained 
constant) and would not reqxiire the continuous^availability of kinesthetic 
information from the same limb (<rf. Wallace, 1977). More recent interpreta- 
tions have kept in vogue with the visual and verbal memory literature. With 
respect to the former,, information about - end location has been viewed as 
"centrally* arousing a visuo-spatial map" for "retrieval purposes in 'subsequent 
reproduction" (Housner 4 Hoffman^ 1981 J. With respect to the latter there has 
been a good deal of attention giVen to using verbal labels as retrieval cues 
for movement positions (e.g., Sh*a, 1977) or to subjecting location to greater 
depths of processing tcf. Craik 4 Lockhart, 1972). Thus location "persists" 
becauq^e it can be analysed more deeply than distance. 

All' of these accounts commit what has been called a first-order 
isomorphism fallacy (FOIP for short; Summerfield, Cutting, Prishberg, Lane, 
Llndblom, Runeson, Shaw, Studdert-Kenriedy, 4 Turvey, 1980), namely, of takiiig 
the predicates that result from describing or "observing a phenomenon (e.g., 
the position of a limb), assigning those predicittes to a memory structure in 
the brain (e.g., as a Hocation code, a visuo-spatial map, perceptual trace...) 
and of claiming, thereby, to explain the phenomenon. One problem with this 
strategy, of course, is that we could take any observable kinematic or kinetic 
movement feature (e.g., relative force, movement distance or duration, hand 
location, etc.) to which an organism is behaviorally sensitive and posit an 
entity in t^e head that is responsible for detecting, coding, or remembering, 
ft. The same criticism also applies to studies of motor control that 
investigate the so-called "content" or "structure" of central motor programs. 
Thus, reaction time to initiate a movement can be related to many measurable 
or observable dimensions of upcoming movement with little or no guarantee that 
the said dimension is coded in the motor program (cf. Kelso, 19B1 ).* Assigning 
movement cues and yarious kinematic/kinetic dimensions to isomorphic memorial 
counterparts as agents of recall and regulation is tautological, and appears 
to confirm only the assumptions of the experimenter. 

This POIP is not restricted, however, to research in control and memory 
of limb movements; it is common in speech perception research as well. There 
the concept of detectors for phonetic contrasts has gained prominence even 
though virtually every such contrast differs along many distinct dimensions 
(e.g., Liberman, 1982; Studdert-Kennedy, 1982). Is there a contrast detector 
for each dimension or cue? Consider the well-studied case of voicing 
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diatlnctlons in stop consonants, e.g., /b/ versus /p/ (Lisker 4 Abramsonf" 
1964). Op to now nearly twenty different cues have been found that distin- 
guish the contrast, among them aspiration energy, first formant onset frequen- 
cy, fundamental frequency » the timing of laryngeal action, and burst energy. 
Ho limit for the number of possibilities — according to some authors — is in 
sight (e.g., Bailey a Summerfield, 1980; Lisker, 1978). 

In abort, many studies in motor control and memory (as well as in other 
areas, e.g., speech perception) have revealed that organisms can respond to a 
wide range of isolable and distinctive event features that experimenters 
manipulate. Such behavioral data, however, do not constitute evidence for the 
psychological reality of the corresponding isomorphic feature codes. or detec- 
tors. 

3. »TOR CONTROL: A GEMERALIZED DYNAMICAL PERSFECTIVE 

A recent theoretical approach to motor control (of. Pitch « Turvey, 1978; 
Fowler, 1977; Fowler « Tarvey, 1978; Greene, 1972; Kugler, Kelso, « Turvey, 
1980) has looked to nested structures of constraints on dynamic system 
parameters (e.g., stiffness and damping coefficients) as sources of movement 
organisation. According to this view, higher order global constraints specify 
a pattern of such parameters that allows the limbs (or any- articulators) to 
become task-specific, functionally defined, special purpose devices. This 
constraint structure will be referred to below as the organizational invariant 
(cf. Fowler 4 Turvey, 1978) characte rising a given action tjrpe. lower order, 
local constraints specify values for those parameters left free-tp vary once 
the global constraints have breen implemented. We shall refer lio these local 
constraints as tuning parameters. 5 por example, the arm will behave as a 
reaching device if globally constrained by the organizational invariant to 
behave as a damped mass-spring system; and the leg will behave as a hopping 
device* if constjrained to behave as a limit cycle system. These global 
functions may be tuned by local constraints* specified by perceptual inforaa- 
tion specific tcr the immediate actor- environment context. Thus, the reaching 
arm will self-equilibrate to a value si>ecified by the perceived* location of 
the target, and the hopping leg will cyclically attain a pe^mhopping height 
specified by the perceived heights of hop-overable obstacles in the path of 
^ locomotion. o 

We would like to promote a perspective on action that argues that 
coordinated movements are functionally defined and (ideally) adaptive events 
whose spa tio temporal coherence and power* requirements are governed by the 
simultaneous confluence' of global and local constraints. In thip framework, 
defining one's uiiits of analysis is a critical first step in understanding the 
bases of movement coordination and regulation. The argument has been made in 
Numerous places (e.g., Bernstein, 1967; Boylls, 1975; Fowler, 1977; Gelfand, 
Gfurfinkel, Tsetlin, « Shik, 1971; Greene, 1971; Kelso « Saltonan, in press; 
Kelso et al., 1979; Kugler, Kelso, « Turvey, 1980; Sal taman, 1979; Turvey, 
1977; Weiss, 1969) that single muscles and/or joints are not the proper 
elements with which to btdld an adequate theory of multiple degree of freedom 
systems able to perform sensorimotor tasks successfully in the real world. 
Rather, the appropriate elements are collectives of muscles/ Joints that act as 
coherent units according to the global, functionally specific task constraints 
defined across actor and environment. Such units have been called synergies, 
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coordinative structures, linkages, etc. Ihese terms reflect the synchronic or 
spatial coherence that this Jype of constraint organisation bestows upon the^ 
actor's Busculoskeletal system. Thus, if one analyses a movement into 
discrete time slices , such synchronic organisation may be observable as ratios 
of muscle activity or joint motion that remain relatively invariant across 
time slices. Although such time slice descriptions are useful for movement 
analysis and robotics control applications, one should not be seduced into 
thinking that coordinated, biologically controlled actions can be reduced to 
transformationally related, time slice concatenations of linkage motions. 
Biological actions are best viewed as .events that have diachronic or temporal 
as well as spatial coherence; they span a characteristic, intrinsically 
defined period of time according to the global, taskrspecific function by 
which the movement is organised. This position echoes Bernstein's (1967) 
assertion that movements may be likened to morphological objects in*that "they 
do not exist as homogeneous wholes at every moment but deyelop in time, that 
in their essence they incorporate time coordinates" (p. 68). Fui^ther, 
"movements are not chains of details but structures which are differentiated 
into details" (ibid., p. 69). 7 

Finally, biological actions are characterised not only by their 
spa tio temporal properties but also by their power- generation requirements. 
Consider, for example, running to intercept a soccer pass. For this task to 
be successfully accomplished, information must be specified about where the 
ball is spatially, where and idien it will arrive at an interceptable location, 
and how much energy must be dissipated by the body to reach that particular 
space- time locale (Lee, 1980). The earlier discussion of Lee's (1976) braking 
problem and the time-to-collision margin values (see Assumption ^ section) 
tmderllnes the relations between perceptual information aiid energetic 
constraints on activity. Let us now proceed to a more detailed treatment of 
organisational invariants and the rather abstract bases of their dynamic 
organisation. 

3. 1 . Organisational Invariants, Degrees of Freedom, and Task-spatial Axes 

It is worth emphasisiafe that skilled actions are §oal- directed. Such 
goals are defined in terms of environmental cfutcomes that are relevant to the 
actor's desires and current behavioral repertoire. For example, skills 
entiiiling the limbs typically involve creating characteristic patterns of 
motion and/ or force at . the limb-environinent interface; speech entails 
articulator motions that shape the vocal tract to create characteristic 
acoustic energy patterns in the airstream produced by the lungs. In all 
cases, however, the effectors relevant to the task are parts of a coherent 
multi-degree of freedom ensemble. The coherence of such ensembles arises from 
the functionally specified, task - lev el structure of constraints (i.e., the 
geometry of constraints) defined over the dynamic system spanning the actor 
and environment. Thus, for example, the -ttot of reaching entails a global, 
functional organisation of the joints and muscles in the arm that guides t&e 
hand to a target under the influence of gravity. It is reasonable to 
hypothesise that this organisation is invariant across different specific 
instances of reaching. Fowler and Turvey (1978) have spoken of such global 
principles as comprising the organisational invariant of a coordination 
problem, as the "function that is preserved invariantly over changes in the 
specific values of its variables" (p. 23). 
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In this fruevorkt understanding the functional basis of a particular 
skill involves discovering the sjatea of global control constraints that 
characterises that skill's organisational Invariant. Such discovery presum- 
ably underlies both the dei^elopiental/ skill acquisition process and • the 
process of analysing experimentally the skilled perfon&ance of well- learned 
behavior. Obviously, there is an Important difference between the discovery 
taska in the two cases. Adapting Pattee's (1973) discussion of the origin and 
operation of natural control systea» to the present issue of skilled actions, 
we say siy that the problem of the origin of a skilled behavior is quite 
distinct from the problem of the performance of a skilled behavior. The basic 
distinction is that the performance of skilled actions assumes the existence 
of an organised system of control constraints, whereas the origin problem mus^ 
account for the establishment of these constraints. Such origins ''must begin 
with low seleotivity and Imprecise function and- gradually sharpen up to high 
specificity and narrow, precise function" (JPattee, 1973t P* 41 )• 

There is a curious and possibly significant parallel between the 
discovery processes of the unskilled novice and the uninformed scientist. It 
might be Justifiably argued that the novice's and the movement scientist's 
understanding of the org^ieational invariant linderlylng. a particular skill 
may be progressively facilitated by gradually increasing the number of degrees 
of freedom controlled or measured during performances of coordinated actions 
relevsnt to the skiir. I^ the case of skill acquisition, one can characterise, 
the ea^ly stages of learning in both adults and children by a tendency to keep 
much of the body relatively stiff or rigid, thereby reducing the kinematic and 
kinetic complexity of the performed movement (e.g., Benati, Gaglio, Moraaso, 
Tagliasco, A Zaccaria, I960; Bernstein, 1967; Fowler * Turvey, 1978; Saltzman, 
1979; Wickstrom, 1977). Further refinements of skill are then said to entail 
selective relaxation of these constraints (i.e., differentiation of the 
constraint structure), guided by the progressive discovery of the patterning 
of reactive forces supplied by the functionally poupled dynamic system of 
actor and environment. The early rigidity or stiffening control constraints 
on the kinematics and kinetics of limb movement0 may be likened to the 
physical* constraints provided by training wheels on the mptions all^owed .euid 
forces encountered by a novice bicycle rider. Essentially, these .early 
constraints play two roles. The first is to provide a rough approximation of 
the skilled aptlon that nevertheless achieves the v relevant goal, i.e., 
satisfies a crudely specified orgeuxisational invariant. The second is to 
facilitate the discovery of ^he supporting dynamics by extending the time 
Interval over which task-stability is preserved (i.e., the bicycle moves in a 
controlled manner without falling over). According to Fowler and Turvey 
(1978), the organisational invariant for a skiil is information specific to 
t&e ' underlying, functionally constrained dynamics of that skill. Such 
information by definition remains invariant and is revealed through time over 
transformations relevant to that skill. Extending the temporal range of task 
stability thereby increases the range of time spanned by these exploratory 
transformations, and enhances correspondingly the ^discovery and 
differentiation process. 

In the case of the scientist's analysis of a well- learned skill, one can 
similarly observe that increasing the allowable degrees of freedom of movement 
in the experimental task can reveal progressively more subtle aspects of the 
organisational invariant underlying that skill. Consider, for example, the 
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mll-knoim aua-sprlng aodtl of liJib contr^ in target acquisition tasks. 
Many recent studies in Aotor control involving positional control at a single 
Joint have led to the conception that such movements are controlled by a 
aystea qxialitatively alailar to a (nonlinear) mass-spi^ng system (e.g., 
Fel'daan, 1966; Kelao, 1977; KTelao, Holt, Rugler» A TuiAy, 1980). These 
■ovaMata are characterised by their eq\ilfinality in that* a given target angle 
■ay be achieved deapite variation in initial position and despite perturbs- 
tlona ^to the ^movement trajectory imposed en route to the target. Pel'dmah 
(1966, I960) and others (^.g., Kelso, 1977; Kelso A Holt, 1980; Polit A BiMi, 
1978; Schmidt A HcGovn, 1980) have described such systems a^ rota^tional tnass- 
spring ^sterns* in iriiich target angles are specified through contjipllable 
agonist and antagonislT muscle equilibrium lengths. If one were to stop here, 
one' vould assume that the organisational invariant underlying such tasks was 
defined relative to joint-^ level control systems. However, these tasks are 
highly constrained instances of well- practiced reaching or yojLnting actions 
that are normadly defined functionally over time, three spatial dimensions, 
and the^ multiple ^oi^^* hand-arm^ trunk linkage system. It is reasonable to 
assume, then, that the organisational ihyariant governing the simple Joint 
positional control case represents a constrained * version of tfee global 
.constraint structure underlying the more generalised reaching/ pointing sld.ll* 
That la; one is led to suspect that the mass-spring organisation discovered In 
single joint tasks might not be' tied literally to control at single Joints, 
but might rather indicate a more abstract functional mode of organisation 
characteristic of reaching and pointing tasks in general. Since this char- 
acterisation is one of function and not mechanism, however, it may account for 
the qualitative behavior of a wide variety of materially different systems 
(e.g., the compensatory behavior of the iaw and lips to unexpected perturba- 
tions, the invariant position of the hip prior to the swing through of the leg 
In the step cycle). 

Recently ' several investigators ( Abend , Bissi , A Morasso, in press; 
Oeorgopoulos, Kalaska, A Mfcssey, 1981; Morasso, 1981; Soechting A Lacquaniti, 
1981; Wadnian, Denier van der Oon, A Derksen, 1980) have supported such 
suspicions in reaching studies involving two Joints- (shoulder and elbow) and 
two spatial dimensions of hand motion. In these cases, they found a relative 
invarlance of the spatial properties of the hand trajectories across different 
reaohlng movements. Topically, the hand moved in an approximately straight 
line from initial to final positions, and exhibited a single-peaked velocity 
curve in this tangential direction. If movements' were oi^ganised solely with 
respect to a target Joint angle configuration, one would expect equifinality, 
but not quasi-straight line trajectories. The existence of auch trajectories 
suggests that, in addition to specifying an^ equilibrium linkage configuration, 
the stiffhesses across the Joints are distributed to produce motion approxi- 
mately in the direction of the current target. It is interesting to note that 
the single degree of freedom experiments may have precluded discovery of this 
control constraint o^ spatial trajectory by physically prohibiting trajectory 
variation in the non- tangential direction. Thus, relaxing eonstraints on the 
degrees of freedom of motion allowed in the targat acquisition paradigm has 
actually enhanced our understanding of trhe organisational invariant governing 
such tasks. 

One might also speculate that relaxing the experimental constraints 
further will result in yet richer characterisations of the reaching organiza- 
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tional invariant. Por exaapl©* if one raetticted hand spatial motion to two 
dlsanaiona and allovad motion at three Joints (shoulder, elbov, and vrlst)', 
thifre vould be no unique relationship betiieen hand position and Joint angle 
configuration. If one again found spatial equifinality and trajectory invari- 
ance, yet additionally found variation in final hand position- linkage configu- 
ration relations, then one might conclude that the organisational invariant 
underlying reaching tasks was abstract indeed (i.e., abstract in the mind of 
the scientist — not necessarily abstract in the sense of mechanism). However, 
Just as the earlier invariances could be produced via specification of dynamic 
system parameters (i.e., equilibrium angles, stiffness distribution), one 
might again suspect that this configurational equivalencB property of the 
organisational Invariant might also be based on dynamic principles\ 

The ^ type of organisational invariant discussed above was specific to 
reaching sldlls, and served to organise the acting upper limb functionally as 
a special purpose reaching device. In this case, the hand behaved as though 
governed by an abstractt spatially defined mass-spring system. Different 
tasks, however, entail different organisational invariants through which the 
limbs (or any set of articulators) become different functionally defined, 
special purpose devices. One further brief example from the robotics locomo- 
tion literature will illustrate this point. Haiberjb and his colleagues 
(Halbert, Brown, Chepponls, Hastings, Shreve, 4 Wimberly, 1961 ) have described 
two aspects of tl^e ^organisational invariant governing lower limb control 
during locomotion. They noted that legs do two things during walking or 
running: a) they chapge length to establish a cyclic temporal framework of 
vertical hopping (i.e., they alternate stance and transfer phases); and b) 
they move back and forth to propel the body and provide balance. Por present 
purposes, we will focus on the vertical aspect, and note that the "cortical 
controller" is organised to maintain a hopping cycle for any desired peak 
hopping height of the body, i.e., this aspect of locomotor function is 
organised with respect to the task-specific, spatially vertical axis between 
the support surface and body center of mass* Furthermore, this spatially 
invariant behavior is provided by an underlying limit cycle dynamic organisa- 
tion, analogous to the "squirt" system involved in the escapement mechanism of 
a pendulum clock. The pendulum clock's escapement mechanism, however, only 
^allows a constant force Impulse to be injected on each cycle of pendulum 
^swlng. Haibert et al.'s (1961) model of a locomoting device is more complex, 
since it can adjust the sise of the impulse on each cycle to maintain a 
desired body height. Thus, the vertical behavior of this model shows 
equifinality with respect to i^he vertical task-specific spatial (task -spatial) 
axis, and appears to be organised according to an abstract, spatially defined 
limit .cycle system. 

In summary, we are thus led to the following informed intuitions 
concerning the organisational invariants underlying different functionally 
specified skills: a) they may be defined in a highly abstract, geometric 
manner relative to task-spatial axes; b) satisfying such abstract invariance 
across task instances may be allowed by appropriate specification of the 
underlying dynamic parameters that functionally characterise the linkage 
system in the current task«ac tor-environment context; and c) the subtleties of 
the orgai6.sational invariant's structure may be progressively revealed and 
differentiated by selectively increasing the controllable, degrees of freedom 
in the task at hand, and by permitting variation in the transformations 
Imposed on these degrees of freedom. 
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3.2. Motor MgKory Revialted ^ , . ♦ 

. In the introductory portion of this pa*per, we suggested that motor memory 
phenomena a^ight arise from dynamic aspects of movement* In Section 3*1 we 
argued that the correct units of analysis for coordinated actions were, 
functional units defined in a task-specific manner across ac tar and environ- 
ment. A £iven coordinated movement was viewed as an event possese'ing 
intrinsic apatial and temporal coherence,* and a characteristic constraint 
structure (an ^rganisational invariant) was described that might provide such 
coherence by establishing a functionally appropriate global organization over 
*he dynamic parameters of the actor's linkage system. The dynamics involved 
wefo defined in an abstract manner, and governed behaviors showing paint or 
limit cycle stabilities relative to task-spatial locations or axes. 

If movement reproduction is a task that is sensitive to movement 
dynamics, it is sensitive to this highly abstract type of dynamics. ^Prom this a 
perspective, it is not surprising that japatial and/or Joint conjTiguration 
equilibrium positions might persist over time, given the underlying general- 
ised task-epatial mass-spring system described above for reaching tasks. 
Additionally, it may not be too surprising that the direction of motion toward 
a target in such positioning tasks influences reproduction accuracy (e.g., 
Wallace, 1977), sltict trajectory direction was suggested to be controlled 
dynamically ^§ii>prb^ria^e, perceptually snecified constraints on the pattern 
of linkage- jbjbnt stiffness parameters. Given that equilibrium configurations 
and stiffness distributions may be characjterified as local constraints (i.e., 
tuning parameters), one might arrive ati the hypothesis that motor memory 
phenomena ar^ related to the relative persistence and stability characteris- 
tj-cs of tuning constraints. Suspecting such a relationship, we would wonder 
why such a relationship should exist in the first place. Why might dynamical- 
ly defined tuning constraints persist at all? What is it about motor memory 
that it should be selectively sensitive to such motor control parameters? And 
finally, could motor memory itself be a consequence of a more ^general ability 
to detect control constraints persisting after movement execution? 

Ejf couching one's questions concerning motor memory and learning in the 
context of functionally specified and dynamically implemented global and local 
control constraints, we believe that the crude beginnings of a unified account 
of control, memory, and learning of coordinated actions may be within reach. 

4. CLOSING COMMENTS 

Here we would like to summarize briefly and selectively our main points: 

(1) v3?here is information that i's unique and specific to the organism's 
dynamics and to the spatiotemporal and energy demands of the tasks 
* that organisms perform. Thus, attention to the-^informational basis 
for knowing what to do, when to do it, and jiow to do it is a first 
step to exploring mechanisms. In this regard, margin values of 
detectable information may be discovered that are specific to an 
action's power requirements. As skill develops, the detected infor- 
' » mation pertaining to the guidance of activity becomes more subtle 

and increasingly precise. Skill acquisition need not be equated 
with the elaboration or strengthening of internal memorial knowledge 
structures. 

/ 
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(2) Th« language of notor control and amory processes is not likely to 
be one of cues.- or features based on a aovenent*s observable^ or 
aeaaurable properties. Ve suggest instead that one look to the 
underlying dynaalc ajstea^ parjaneterisation that gives rise to a 
«ove«ezit*s klneaatic or kinetic ,observabiee. In other words, 
dynaadoa is the language of motor mMory and control* Such dynamics • 

^ are defined^ abstractly with respect to functional^ ^ask-spatially 
defined locations or axes. 

(3) Motor control and coordination are likely to fall under the rubric 
Of functionaliy specific, special-purpose proceeses. They are lees 

'likely to depend on general procese views obtained from other areas 
of biology and psychology. The limbs can become different types of 
funotlbnally defined, special purpose devices for different types of. 
tasjics by virtue of global constrainte defined over the underlying 
.dynamic system parameters* This global constraint structure is 
labeled the or^mlsatlonal j.nvariant . Nested within these global 
constraints are a set of local constraints or tuning parameters by 
which a movement is tailored to the specific details of the task's 
actor-environmrat context. We suggest that one can gain a better 
exjperlaiental portrait of an i^ction type 'e* organisational invariant 
by systematically increasing the degreee of freedom controlled and 
observed in the experimental task. Finally, we also suggest that 
motor mem<^ry phenomraa in reproduction jparadigms- may be intimately 
related to^ the degree of persistence of a movement's local timing 
constraints* 
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P0OTHOTB3 

^Most of the work in the area of motor memory has been done by 
researchers in i^ysical education, kinesiology, and human performance, while 
cqntrol is a much larger field. Bven in the area of control t however, some 
apparently simple problems have resisted consensus. For example. Stein (In 
press)^ poses the question *'What muscle variables does the nervous system 
control?** without providing a definitive answer, yet this question has been on 
the neurosoientist*8 mind for at least 30 years. 
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2At a larger scale, attributing a person's erroneous behavior in certain 
laboratory tasks to a lesion in the frontal lobe leads elegant cause-effect 
naurologioal ■odela of apraxia. Unfortunately, such •oJels ar^ enbarrasaed if 
not infimed by the patient's abUity to perfora the saae tasks^ when the 
situational context Is sufficiently rich (e.g., wife to husband: "Hang that 
picture on the wall," versus neurologist to patient: « "Shfw me how you haimer 
a nail," cf. Kelso a'Tuller, 1981). 

3Hote that the "generalised IQ" of such special purpose devices may be 
quite low. The poUr planimeter, for example (cf. Runeson, 1977), is a rather 
simple mechanical device that provides a sensitive measurement of the area of 
a bounded planar figure. However, it can perform only crude measurements of 
the conceptually "simpler" periiaeter length of the figure. 

^Introspection as a methodology for psychology has had its day, but it 
can often help us us to appreciate the nature of the problem. In the case of 
lotor seBory, what actually is remembered? A movement?^ Or a piece of it such 
as a cue? If the reader was asked what movement she produced yesterday at 
3:00 p.m., how would she respond? If anything, is remembered it would be tas|c 
referential— like drinking, going to the toilet, talking to a colleague— but 
the movments associated with such actions are hardly remembered. In riding a- 
bicycle after many years, what is remembered? Hardly a sequence of movements. 
More likely it is the capability to transform the system (person-bicycle- 
onvironment) such that the right properties are revealed, i.e., that transfor- 
mation across the links of the body that allows- one to achieve equilibrium on 
an unstable object. 

5The reader should note that the present use of parameter tuning is 
distinct from two previous uses of the term "tuning" (i.e.. spinal tuning and 
biomecbanical tuning ) in the motor control literature. Spinal tuning 
describes physiological patterns of modulation of the spinal cord elements as 
discussed by Gelfand, Gurfinkel, Tsetlin, and Shik (1971), Ourfinkel, Kots, 
Krinskiy, Paltsev, Fel'dman, Tsetlin, and Shik (1971), JCots ( 977). 
Biomochanical tuning (cf. Greene, Bote 1; Saltman, 1979) is defined relative 
to skeletal Joint motions and muscle forces. In this biomochanical Benae-, a 
movflment can be described by the contributions of main biomochanical variables 
and tuning biomochanical variables . Main variables provide a joint motion or 
muscle force pattern that roughly approximates a desired movement pattern. 
Tuning variables are used to improve the movement approximation provided by 
the main variables. 

■6At first glance, organisational invariants and tuning parameters appear 
simUar to the concepts of general iaod motor programs or schemas and variable 
parameters (of. Keele, 1981; Pew, 1974; Schmidt, 1975. 1980), respectively. 
They are quite distinct, however. The latter concepts are based on a 
aovMent's observable kinematic or kinetic feature^, (e.g., movement time, 
measured force output, muscle/ Joint groups, etc.) , tether eas the foraer are 
based on the movement's underlying dynamic parameteriBation , which gives rise 

to its kinematic/kinetic observables. 

* .1 . ■ 

7The mass-spring model of position control at a single Joint is appealing 
within- this framework since it provides a movement with intrinsic temporal 
coherence, i.e., the movement's duration is specified by the system s mass and 
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•tlfhiM» paruitters. It i» ittpo8fiible by defLnitloa, homver, to talk of 
spatial colwroiaco across joints in singls Joint nations. Thus, in our later 
discussions of a ganeralissd mass-spring modal for multiple degree of freedom 
positioning tasks, ws vill suggsst a possible iray to define such synchronic 
constraints vith reference to underljring dynamic parameters. 
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IS m -cocaiTivE pbmbtrability* critehioh ihvalimtbd by contemporary 

HIYSICS?* 



Peter KUgler H. T. Tarvejf^ and Bobert Shaw-t*-** 



Tylysl^yn (1960) advocates and extends a popular view that the model 
aource for the explanatory concepts of cognitive science is the science of 
fornal sj^bol-manipulating machines. Qie argua^t is that the proper vocabu-* 
lary for lionstructing adequate explanatory theories of the knowings o;f animals 
and ^hioiana is the representational*- computational vocabularj^ of computational 
science and artificial intelligdbice. 

Ihe representational- computational perspective on knowings is far from 
recent; it has appeared in various forms for over two millennia, being 
intimately linked with philosophical attitudes variously termed "representa- 
tional realiai/' ''indirect realism/' "idealiaih," and- "i^iehomenaliam." ]^ and 
large, these attitiidee follow from a distinction between the "physical" object 
of reference and the "phenomenal/' or intentional, object^ that is said to be 
directly experienced and to idiich behavior is referred. It has been common- 
place over ^the ages to question the cooxdination of the two kinds of objects, 
and it has iaeaaed a simple enough matter to identify slippage between them. 
In consequence, it has frequently been concluded that the reference object 
might just as well be exclixled from explanatory accounts; thefe are doubts 
that it can be known, and even doubts that it actually exists. ^ The 
representational-computational vocabulary and its allied philosophical pos- 
tures question or deny that the world is kno,wable. Animials and humans can 
only know the phenomena (sense data, repnesentations, etc.) that their brains 
or minds supply (see Fodor, 1960)- * In sum^ philosophy and science have been 
unable to characterise the animal- environment relation in a way that allows 
that idiat animals know is real, existing independently of their knowing it. 
Ihis state of affairs is curiously tolerated d^pite its obvious contradiction 
of the scientific enterprise (see Shaw 4 !Rirvey, I960, on Podor, 1980). 

Amoi3g the uBny assumptions and intellectual commitments that prohibit a 
realist posture (see Shaw 4 !Rirvey, "1961; Shaw, Turvey, 4 Mace, 1962) is the 
assumption that contemporary physical theory is complete. The ccxnplete 
theory's failure to accommodate regularities in, biology or psychology gives 
license to propose new, often special — in the sense of extraphysical — 
principles. I^ylyshyn proposes "cognitive penetrability" as a methodological 
critej^ion that is sufficient (but not necessary) to distinguish those phenome- 

I m . 
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Oft i^oM Mplcnatlon rtqulrts tha privUtgtd vocabulary of represexitatioxi and 
coaputation Arcs those phanoaana that can be appropriately described by 
physical law. Our reading of idiat is necessary for the "cognitive penetrabil- 
ity" critarion is a good deal mora general than ftrlyshyn's, but ne believe it 
to be accurate* The necessary condition is that the behavior of the system in 
question be nondeterainate^ that Is, not dominated bj boundary and initial 
condltiws > im ve describe lieloVt this necessary condition is met by a broad 
class of physical systras tamed **dissipative structur^s/'^sys terns that are 
indeed ••mere** instantations of physical law and, therefore, by tlienrriterie»rr- 
systems that do not entail the representational- computational vocabulary. It 
seems to us that the criterion is diluted, if not invalidated, by recent 
extensions of physical theory. Because of this fact, we question its 
completeness and its propriety for natural phenomena. 
♦ ' » ' ■ 

Before turning to a description of dissipative structures, let us remark 
on an aapect of lyiyshyn's argUDent that we find especially puesling — the 
conjunction of I^lyshyn's pursuit of nondeterminacy as the necessary condition 
for genuine cognitive processes and his advocacy of foxmal sj^bol-manipulating 
machinea as the model source for explaining such processes. lyiyshyn wishes 
to earmark for cognitive science behavior that is not determinately bound to 
environmental events; such behavior, it is argued, can be accounted for 
exclusively by the representational-computational vocabulary. However, no 
suggestion is given of how the various algorithms and representations are to 
be nondeterminately selected. Computational devices are all determinate 
machines in idiich the output is completely specified by the initial conditions 
(input) and boundary conditions (algorithms and representations). Oddly, by 
'selecting the foimal sj^bol-maxiipulating machine as tiis model source, lyiysh- 
yn, like other proponents of his view, fails to offer any ndntrival distinc- 
tion between the popular model of cognition and any prototypic behaviorist 
model, that is; between cognitive science and behaviorism. 

Dissipative structures as consequences of conditions on natural law . An 
analogue to lyiyshyn' s "penetrability" condition can be shoim to exist; in 
physical systems governed by natural law when such sypttems are construed \as 
dissipative structures. Although this ^ea requires careful and complete 
developBient, a sketch of the argument cfflC/be given as follows: Classical 
reversible equilibriw thermodynamics describes the thermodynamic behavior of 
a system only idxen the system is in or near a state (condi'^ion) of 
equilibrium. In addition, the ''system may exchange neither matter xjor energy 
with its surrounds. S|Srstems meeting these conditions are referred to as 
isolated closed systems . The behavior of these systems is characterised by a 
tendency to run down to a maximum state of disorder, zero infoxmation*, and 
loss of the ability to do work (Bridgeman, 1941)* This behavioral state is 
entropic equilibriua, and once a system is in this state nothing new can 
emerge as long as the cbnditions of the system remain isolated and closed . 
Under these conditions, the thermodynamic analysis is complete. The reversi- 
ble quality of these systems is evident in the fact that if a perturbation 
occurs to the syaJtem under these conditions, the system responds by going 
through a succession of states, all of idiich are at entropic equilibrium. In 
short, the entire event occurs in a state space in lAiich all points in the 
space are homogenous with respect to entropic equilibrium. The concept of 
reversibility is reflected by the fact that there are no preferred points in 
the entropic state space: States may reverse themselves and still maintain 
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tlia conlition of antropic equilibriua. ttider these conditions the system's 
behavior is ccapLetely determinate and specified by initial and b()undary 
conditions, auch conditions do not aljow for the possibility of autonomy or 
8elf*organisation« While some real events (such as very slow processes in the 
macrovorld) are rather nell described by the conditions surrounding classical 
reversible equilibriua thermodynamics, most interesting events regarding bio- 
logical and psychological systems are not. 

Our suspicion is that I^lyshyn's concept of "natural laws" is based on 
the above conditions, namely those of an isolated, closed (thermodynamic) 
system. We would suggest, however, that a model for a biological or cognitive 
system is poorly represented by the conditions of Isolated, closed systems. A 
more appropriate model might be found in the less familar conditions of open 
physical systems (that is, systems that exchange energy and matter with their 
surrounds). While the natural laws pertaining to open conditions are the same 
as those pertaining to closed conditions, systemic behavior under these two. 
conditions is dramatically different^ In particular, when certain conditions 
manifest themselves, the behavior of open systems need not tend toward a state 
of thermodynamic equilibrium but more generally toward a steady state regime 
displaced from equilibriun and maintained by a continual flow of free energy 
and matter into and out of the operational component of the system (Iberall, 
1977» 1978-a, 1978-b, 1978-c; Morowits, 1978; Prigogine, Nocolis, Herman, 4 
Lam, 1975)« The necessary conditions for such behavior are: 

1. A reservoir of potentiad energy from which (generalized) work can 
arise ; 

2. A microcoan of elements with a stochastic fluctuating nature; 

3. A presence of nonlinear components; 

4. A scale change such that a nonlinear component is critically ampli- 
fied (in the sense that'^^e system's own dimensions now resist the previously 
dominant effects of the initial and boundary conditions). 

If these conditions are present (see Szentagothai' s, 1978, commentary on 
PUcetti 4 Pykes, 1978), then the possibility exists for the transition from 
the stochastic steady-state condition to a spatially structured steady-state 
condition or a time- dependent limit cycle regime characterized by homogeneous 
oscillations or by propagating waves. These regimes are stable in virtue of 
the amplified nonlinear components, and are maintained in virtue of the 
"dissipation of energy." The manifestation of » these open systems is hence 
achieved by drawing spontaneously on potential energy sources, so as to remain 
stable in the nonlinear sense lAile dissipating energy (that is, so that there 
is a greater loss of orddr in the surround than the gain of order by the 
system ifiBfeTf--the behav^ior of such systems is said to be "lossy" with respect 
to energy). Prigogine (Glansdorff 4 Prigogine, 1971; Nicolis 4 Prigogine, 
1978) has termed such systems "dissipative structures" to illustrate that 
their foraation and maintenance require a continuous flow of matter and energy 
. from an outside source. The behavior of dissipative structures is prototypic 
of their thermodynamic engines (cf. Iberall, 1977; Yates 4 Iberall, 1973) in 
that the mean states of the internal vjariables are characterized by fluxes 
and "squirts" of energy that become constrained by nonlinear components so as 
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to behave In m llait oyole aenner (Katchalaky, Rovland, A ELunenthal, 1974; 
Kugler, Kelao, a !IUrvey, 19B0, t^2; Wlnflree, 1967; Yates, Ifarah, A Iberall, 
1972)* In tMB manner such Bysteiita reaolve the internal degrees of freedom 
problaa that manifaata itself ap blatantly in foxmally closed, artifactual 
aystama* Wharaaa artifactual ayatema are not capable of self-organisation or 
autonomy, diaalpative atructurea reveal poaaible inaights into auch problems. 



In particular, diaaipatlve atructurea are asaociated with a situation 
called "orter through fluctuation" (Prigogine, 1976). Ihder the above condi- 
tiona, certain atructurea may ariae fl-om the amplification of fluctuations 
raaulting from an Inatability of a " thermodyanamic brinch/' Because symmetry 
ia broken, new atructurea are^formed. Hhese new structures may possess new 
functions that cprreapond to a higher level of interaction between the system 
and ita environment (Prigogine 4 Mc^olis, 1971 , 1973)- The symmetry-breaking 
in'atabllitiea are dependent on acale factors, and the concomitant bifurcation 
pointa in the fluctuation phase provide places idiere the autonomy of the 
diaaipative atructure exerts itaelf. While the thermodynamic branches '^em- 
aelvea are determinately apecified by stability and bifurcation theoryy thg^ 
actual choice of idiicfa branch (atability mode) the system enters may ultimate- 
ly be non-determinately apecified by a dlm^sion intrinsic to the system (as 
oppoaed to determinately specified, a notion associated with closed systems, 
or indeterminately apecified, as associated with a randomising component). 
If, however, a ayatem ia compoaed of sufficiently small numbers of fluctuating 
elements, the system* s behavior will be dominated by the boundary and initial ^ 
conditiona and can never exhibit autonomy (Nasarea, 1974)* 

It is only when a system ia "scaled up" beyond some critical dimension 
that nonlinearlties are able to be sufficiently amplified to lead to some 
choice between various aolutions (thermodynamic branches; Hanson, 1974)* Only 
under these conditiona do the system* s own dimensions become sufficiently 
influential to reaiat the previously dominant effects of initial and boundary 
conditiona. It ia at thia point that the system achieves some autonomy with 
respect to the outside world and may be aaid to be nondeterminate • In other 
worda, prior to the scaled-up condition, the system behaves in a determinate 
faahion; after the critical condition is reached, the system* s behavior 
becomea nondeterminate and autonomous on some dimensions, an autonomy that may 
be manifeated in the macro structure of the system's behavior. 

Itader theae conditiona the behavior of the system is not "causally" 
linked to the environmental conditiona and therefore might be said to come 
under the ao- called penetrability criterion. But should we be willing to say 
that cognitive factora enter Into aysteraa aimply because such weak links exist 
in the cataaal chain? To anawer yea would be tantamount to aacribing the 
epithet ** cognitive*' to systems considerably leas evolved than humane and not 
neceaaarily animate* That cognitive factors might enter is clearly a hypo- 
thaaia that goea far beyond the mere existence of xlondeterminacy in a 8ystem*s 
linkage to ita environment. * For thia reason, it seems to ua that I^lyahyn has 
failed to make a cogent caae for the usefulness of his *' cognitive penetrabili- 
ty** criterion. For, to accept it we would either have to conaider the 
poaaibility of beliefa, motivea, and the like entering into purely physical 
ayatema of the diaaipative variety, or have to ignore their existence 
altogether. 
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Ilnillyy m note that a dlaalpatlve ayatein will manifest a stable 
regulaX'ity on csrtain dlasnsions of its behavior owing to U th» nonlinear 
ccmponents that have been sufficiently amplified. Ihis regularity may be 
disruptsd if tha system fklls belov the critical scaling conditions. However, 
if the eystm is in the critically stable domain, tha^ any perturbation on the 
input aide of the aystem will only temporarily disrupt the system* s regulari- 
ty. In addition, the regularities are not necessarily contingent on their 
material aubstrates (Thorn, 1975)* Sbrstems sharing the same dimensionality but 
not necessarily the same substrate will share a common set of stable 
regularities. (This, we would claim, is the i^ysical equivalent of I^lyehyn's 
transparency oonditon.) 

k hint at how "cognitive" phenomena might be explained in the 
nonprivileged vocabulary of physical theory . Here we consider a phenomenon 
that in ita apparent organisational complexity is, on prima facie grounds, not 
unlike the phenomena of interest to cognitive science. Our purpose is to show 
how phenomena of this kind might not require a privileged vocabulary for th^ir 
explanation and how a realist perspective on such phenomena might be pursued. 

Bspresentations and algorithms, while introduced as a convenient way to 
inquire into the organisation of aystemic activity, very often assume ontolog- 
ical reference apart ft*om inquiry (Dewey A Bentley, 1949)* With this assumed 
status 9 it is tempting to put such "between things" that coordinate animal and 
emviroment into the role of explanatory first principles. For example, if 
one says that the relation between aspects of a system's input and aspects of 
its behavior is programmatic, then one is tempted, with regard to the input 
aspeotSy to cittribute the systematicity of the system's behavior to the 
systematic ity of a program, and in the case of biological systems, to assign 
this new object a location someidiere in the animal's nervous system. 
equate a program with the causal basis of a behavior is not only to introduce 
sui generis a apecial explanatory principle, but is, additionally, to sub- 
scribe to a view in vhich the orderliness of a phenomenon is said to be owing 
to an explicit, a priori description of that orderliness. In summary, a 
program or representation is conceived as an ordering of details that precedes 
a behavior and is causally responsible for the Ordering of behavioral details. 

!Ihe goal of the realist's style of inquiry is to minimise first 
principles: rigorously considering the reciprocity among complementary 

components as a global propert^, many "between things" sui generis may prove 
unnecessary to account for the animal-enviroxndirt relationship (Kuglerlet al., 
1962; Shaw A !l\irvey, 1961). Uader the constraints of this style of inquiry, 
the orderliness of a systemic phenomenon — such as a behavior^-is not owing to 
an a priori prescription for the system but rather is an a posteriori fact £f 
the system— that is to say, a property that arises from within the systejpi 
during the course of the system's existence. Any explanation of a natural 
systei^c relation that appeals to some a priori embodiment of that very 
relation would be rejected by the above perspective; for such an explanation 
is a step toward phenomenalism and a step away from realiam and, in 
consequence, a step away ^rom a unified view of physical explanations 
regarding natural phenomena. Qjr the precepts of a realist's view .the appeal 
to a mediating factor* or a "between thing" as an a priori source of 
behaviorcd order arised from ^ incorrect perspective on the phenomenon. 
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•to illustrate ^this point, let us apply the dissipative structure story 
developed above to the phenomenon of insect architecture. Consider, for 
exoaplet the early phase of termite nest building, in which pillars and walls 
are constructed sufficiently close together to permit the fomation of arches. 
Ate construction proceeds in two stages: In the first stage building 
materials are randomly deposited. In the second the termites tend to 
aggregate and to accuaulate material at far fewer sites than the number of 
original deposits. 

An individual termite relatea to its surroundings chemotactically, moving 
on a local chemical gradient. Ihe attractant is'' a scent the termites 
contribute to the building material during their manipulations. When the 
accuaulation of building material through random deposits is low and the 
nmber of deposits relatively few, the diffusion of the scent is homogeneous 
over the area in wbich material is being deposited. Ihis means that as far as 
the individual termite moving on local gradients is concerned, any locale is 
as good as any other, bagine now a termite moving through the buildixig area 
fifter some amount of random depositing has occurred. Itae ^greater the number 
of random deposits^ the greater the likelihood that an individual termite will 
pass in the neighborhood of ^ deposit. In terns of the attractent* s diffusion 
in the air, the place of a deposit defines a local maxlmua, a place idiere the 
density of pheromone molecules is at its greatest. In the neighborhood of a 
deposit, therefore, chemotaxis is biased toward the coordinates of the 
-deposit. In consequence, a place idiere a deposit has been made is a place 
that ••invites** further deposits to be made. Speaking fomally, the latter 
identifies an autocatalytic reaction — the accunulation of material at X is 
increased by the very presence of material at X. The criticalness of this 
autocatalytic component rests with an appreciation of the fundam^tals of 
nonequilibriim, irreversible thermodynamics, that is, with the fundamental 
character of open systems. A further exposition of open systems will permit 
us to take the next step toward an understanding of the architectural 
achievaient of termites as a necessary a posteriori fact. 

for an open system there must be a source of high potential energy and a 
low potential energy sink such that in the drawing of energ^j. f rom the higher 
order form and relegating it to the lower order fom, work is done in a 
generalised fashion* More commonly, we say that across the boundaries of an 
open system matter and energy are continuously in flux* As described above, 
open systems ay consistent with familiar thermodynamic law in that, being 
dissipative, tl^r operations lead to a net increase in entropy on the global 
scala. At the' same time, however, these very operations generate negentropy 
or structure on a local scale. The emergence of a (new) structure depends on 
the presence of nonlinearities in the system and a sufficient change ^f scale 
in one or more system dimensions* ^ \ 

FluotuationSt understood as spontaneous deviations from some average 
macroscopic behavior, will always occur in an open system with many degrees of 
freedom. When the fluctxiations, and hence the deviations, are not large—such 
as might be the case at low fluxes of energy— the response of the system is 
usually to restore the original state, that is, to move as close as possible 
to maximtm entropy and hence away from structural isation. However, the 
fjresence of nonlinearities » combined with a scaling upward of, say, energy 
flux, allows for a pronounced amplification in fluctuations, such that the 
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eystea ia driven to a new average atate of fever degreea of freedom. In 
abort, where an open ajatea with nonlinear itiea ia at a critical distance from 
equilibriua, a new atructure aBergea. 

Beturning to temite architecture, the autocatalytic reaction, by ufaich 
the preaenoe of aaterial at a aite atiaulatea the depdaiting of more material, 
^ ia a nonlinear contributor to the dynaaica of the termite-nest system • Aa the 
randoi depoaitlng proceedBf some sites will accumulate more material than 
othera. Such being the case, the nonlinear autocatalytic factor determines 
^ that* given two aitea with unequal aocusulations, the site with more material 
vlll grow at a faater rate than the aite ifith leas material. In the spatial 
diffuaion of the pheromone moleculeaf marked inflections vill appear in the 
diffuaion apace defining ^'preferred" locations on which the chemotactlc 
^ trajeotoriea of the termitea will converge. The diffusion space is no longer 
homogeneous; the previoua stable atate of affairs, characterised by the random 
depoaiting behavior of the termitoBf gives way to instability and, in turn, to 
a new 8tabllity~a stage of activity in idiich the termites "coordinate"' their 
individual aotlvltiea at certain 8itea« producing, by their combined efforts, 
pillara and walla. How, if in a certain area two large deposits are In close 
proximityy then we may auppoae that within that area the diatribution of 
pheroaone moleoulea will articulate gradients pointing toward a local region 
of greateat denaity between* and approximately at the heightvof, the two 
djopoaita. One can Intuit how termite movements on these gradients, according 

* ''to the almple chemotactlc prlnciple« will eventuate in liivks betwden the two 

proximate depoaita^ that fs« to the foxmation of arches. 

In Prlgogine*B terms (Hicolis 4 Prlgogine, 1978; Prigoginer 1976; Prigo- 
gine A Ricolla, 1971) the termite nest is a dissipative structure — a stable 
organiiation that is maintained away from maximal entropy through the degrad- 
ing of a good deal of free energy. The form of the nest arises as an a 
posteriori fact of the termite ecoaystem. It is not owing to a plan or 
program invested a priori in the Individual termite or in the "collective" 
termite. !Ibat self-actional explanation, which would malce "plan" a principle 
aul generia, la replaced by an explanation of greater generality that is 
conaiatent ^wlth physical theory. ^ 

^rms such as "algorithm" and "memory" are commonly used in inquiry to 

* fulfill the role of an a priori ordering principle. Obviously, from the 
arguments presented here« auch t^rms and the roles assigned to them are 
suspect and may well owe their ex4^tence to an Im^proper analysis of the 
physical conditions surrounding the {in^nemenon they are meant to account for. 

Concluding remarks . We have argued that the necessary condition for 
"cognitive penetrability," conceived in its most general fora, fails to 
aegregate thdse phenomena requiring the privileged vocabulary of 
representation and computation from those accommodated by the nonjnrivileged 
vocabulary of physical theory. We have further questioned the propriety of 
the representational-computational vocabulary being used to reject realism 
simply because eplatemic relations between animal and environment may lack a 
determlnlatio character. Consequently, we suspect that the search for 
fundaentals in cognlti%e /science would fare better in the long term if it 
chose a model source ^that embraces the conditions of autonomy and 
morphogenesis as an a posteriori fact in the spirit , perhaps, of PLaget 
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•(1977), Frl«o«lne (1978), or BarrUl (I96l)~the vocabulary of physical 
theory-*«rmtlier than a model source that embraces conditions of neither kind — 
the vocabulary of fomal machine theory. Admittedly, this suspicion, if 
valid, seriously reduces the promise of any immediate gratification from the 
very popular representational-coaputational approach to cognitive phenomena. 
Bit perhaps it would not be too harmful to ask computer scientists idio address 
oognltlve issues to temper their hubris, since the difficulty of the search 
fbr a scientific basis to realiaa counsels the need for considerable patience. 
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^ 1. IltTRODUCTIOH 

One of the aoet popular tacks taken to explain cognitive processes likens 
tbes to the operations of . a digital computer • Indeed t the tasks for the 
cognitive scientist and the artificial intelligence scientist often are seen 
as indistinguishable: to understand how a machine or a brain *'oan store past 
Information about the vorld and use that memory to abstract meaning from its 
percspts" (S0ISO9 1979f p* 423)* ^e fact that there are machines that appear 
to do thii^» to varying degrees of Recess » is often taken to imply, almost by 
default, that cognition vould have to embody the same steps in order to 
achieve the same results. In what follovst we shall outline our objections to 
this attitude and consider briefly some alternatives. 

2. ;A CHARACTERIZATIOH OF COHPUTATIOHAL APPROACHES TO COGHITIOH 

The prototypic embodiment of the computational view is to be found in the 
early work of A. N. Turing who, guided by his introspections of how he 
computed # designed a hypothetical machine that could be programmed to compute 
any function that waa computable by algorithm. If an algorithm could be 
written to describe a particular cognitive function, then the Universal Turing 
Machine could be programmed to execute that function. On extension, if the 
machine could be made to "act like a human,** that accomplishment was meant to 
provide insight as to how a human acts. Of course, the universality of the 
Turing Machine benefited from its hypo the tically infinite memory capacity, 
hypothetically perfect reliability, and a computational speed that, hypothetic 
cally, could be as fast as the task required. In short, Turing's "invention** 
was meant to be an ideal device operating under ideal conditions. 
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Such a d«vlc« has appaaltd to atudmta of cognitive phenoaena on several 
fronts • the reason for this allure is obvious if ve ezanlne the framevork 
vlttiin lAiloh sost cognitive scientists operate. At bottoa, it is assumed that 
the sense data with i^ch a perceiver 1^ provided relate equivocally to their 
sources in the enTiroxoimt* !Die input to the brain is so referentlally opaque 
that the aeaningf^ilnsss aust soaehov be restored (or recovered) by the 
perceiver. This is acc(»pllshed by aeans of internalised cognitive procedures 
that operate on the sense data, coabinlng and transforming thea In various 
vaya until* finally, a reasonable facslaile of the vorld has been constructed. 
If these cognitive procedures can be captured In the fora of algorithms, it 
aeans that they can be executed irlthout the Intervention of a mystical agent, 
fb construe procedures as a sequence of eimple, discrrete, deterministic, and 
finite instructions that can be executed by a machine presumably rid cognitive 
science of the omniscient goblins that too often seemed to creep into accounts 
of knovljdg the world. 

lo doubt, this framevork of indirect realism--knO¥ing the world through 
an internally constructed and stored representation of it-- contributes to the 
vigor with which cognitive science has embraced computational science. When 
it is coupled with the early belief that neurons, which are the substrate of 
the cognitive machinery, have only the same discrete character as switches 
(i.e., they either fire or do not fire; they are either on or off), which are 
the substrate of the computational machinery, tlje marriage of mind with 
computer seemed ideal. Unfortunately, ideal properties have little to do with 
thi natural dircumstances in which knowers of the real world find themselves- 
1\ is from this perspective that we will initiate our criticism of computa- 
tional approaches ^ cognition. ^ 

3. PAILIHQ3 OF THE COMPUTER METAPHOR 
5* ^ The Emphasis on Logic Is Misplaced 

A Universal Turing Machine is an ideal mathematical object; it represents 
a formal manipulation of symbols and owes allegiance to criteria of logical 
consistency but not to physical laws and constraints. Thus, for example, 
physical variables play no essential role in the concept of algorithm. In 
reality, however, every logical operation occurs at a minimt^i cost of KT of 
energy dissipation (where K is Boltsman's constant and T is temperature) and, 
in fact, occurs at a much higher cost to insure reliability. 

Of course, a computer instantiation of a formal operation entails the 
dissipation of energy, but what distinguishes the computer from the animal in 
this respect is that the computer has a single demand (computation) on 
relatively unlimited energy resources, whereas the animal has multiple demands 
on llmittd energy resources. For sound physical reasons, a formal operation 
that is logically possible and biologically realisable may not be useful. 11; 
is acknowledged aacmg those who would simulate '*mlnd'* on a computer (e.g., 
Marr, 1976) ttiat the construction of an algorithm for some purpose is 
trivially fettered. Algorithms can be like '•just so" stories (a designation 
that highlights excessive imaginativeness about causalities, as in Kipling's 
account of how the elephant got its trunk) in the absence of a serious attempt 
to view them in the context of the i*ysical biology of the syptem for which 
they are IntendTed. 
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TO ba raduadant, tha aara azlataqca of an algorithm doaa not conatltute 
an azpjLanation of a phanomenon. That la to aay, alaply because an algorithm 
<mn ba vrlttan to aimulata a given activity of an organiam, it does not 
nacaaaartly follow that the orgaMam uaea auch an algorithm in performing the 
activitjr in quaation (Cuaalna^ ^977 )• The algorithm* is merely a description 
of tha activity; it may ba Juat^ one of aeverml alternative descriptions. 
Vhila va aa acientiata might need a deacriptlon in order to talk about a given 
activity, and an artificial device needa an algorithm in order to simulate 
that activity, natural systems do not Jtequire explicit instructions in order 
to perform their natural activities. On the contrary, for nirtural systems it 
la largely the free interplay of forcea, not a priori prescriptions, that 
realiaa atationary and trtnaitory atates. The Significance of considering a 
system* s continuous dynamical processes will figuVe repeatedly in this paper. 

3.2 Discrete Operations Are Overvalued 

Underlying the equating ^/>f^ognltion iriLth computation and representation 
is the thesis that intelligence can be accounted for or simulated by discrete 
happeninga in automata. It ia claimed that, Just as continuous functions and 
variables can be represented by a finite set of discrete symbols and rules, so 
can intelligent operations of mind. Thus, for any system of sufficient 
complexity to ba ascribed the epithet •intelligent,* one paricular type or 
mode of systemic f\inctioning--the discrete symbolic mode—is advanced as the 
only aspect of the system's behavior that is significant to uivier standing its 
intelligence. 

.. ' ' 

This thesis is fundamentally flawed. To anticipate, for any complex 
system, the label "inte^igent** belongs most legitimately to the dynamicaj. 
mode that creates and interprets the discrete mode, and less legitimately to 
the dlsciiete mode that is (merely) a product (cf. Pattee, T974)- 

To fix this idea of , discrete mode, consider a continuous dynamical system 
such as the motion of a number of particles in a potential field. To describe 
this process, the physicist us6» a few equations -relating a small number of 
aymbola. That is, by ignoring moat of the details, a rate- dependent process 
is translated into a rate-independent structure. In expressing an Understand- 
ing of the continuous process throligh a discrete set of equations, the 
phyaiciat ia aaid to be operating in the discrete, symbolic mode. It is 
universally recognised that thia discrete, symbolic mode is essential for 
clear and e^act descriptions and it would be universally recognised that the 
phyaiolat ealtoibited an act of intelligence in arriving at the abstractions in 
question. ^ ^ , 

T6 further fix this idea Qf a discrete mode. We note that observers of 
nature may not be alone in its use: Biological systems in general may rely on 
discrete (aelf-) deacriptiona for their successful functioning. A prime 
example is the genetic code, a rate-independent structure (aa far ae we can 
tall) that ralatea the nucleotide aymbol vehicles to their corresponding amino 
acida. There are two notable faaturea of: thia particular example of a 
dlacreta description, flrat, the genetic code ^jua description's simple and 
incomplete relative to the detailed continuous dynamics that it controls. The 
structure of the amino acids and how they are to fold and operate as rate- 
controlling enayme^ are processes involving tens of thousands of interacting 
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d«gniM of trmmAom. Second » the sMning of the genetic code cannot be 
easeaeed by tranaforBatlona or tranalatlona into other dlacrete deacrlptlona* 
Little headway la aade toiiard Interpreting the neaning of the code by 
tranacribing the HA atrlnga into aeaaenger BHA atrlnga or aeaaenger BHA 
atrlnga into linear polypeptide atrlnga. One atrlng la aa good a deacription 
aa any other, and all fail to convey the aeanlng. Bather » the interpretation 
or meaning la in the folding proceaa«--a contlnuoua dynamical proceaa — which la 
not aelf-deacribed in the cell'a code (Pattee, 1974, 1977; Watera, Bote l). 
He can only explain the aeanlng of the IMA atrlng by reference to the. 
dynaaloal aode that it coaplmenta. Moreover, it makea a lot of aenae to 
argue that ve can only explain the origin of/the DHA atrlng by reference to 
contlnuoua dynasical proceaaea. 

The dlacrete node night be characterised, generally, as aingularitlee 
condenaed out of contlnuoua dynaaica, a characterisation that la oonaonant 
with recent atteapta to generate biological organisationa from the 
ailnfularltlee of a dynamical topology (a.g., Shaw, 1980; Thorn, 1973)- Thin 
Oharactertaation, however, will be conaidered incomplete to the extent that 
one tflievea that the dlacrete mode muat be structurally embodied (and a 
fortiori that atruoture and ftin&tion are complements) • Tlie genetic code la 
Mid to be embodied by the JMh atrlng and apecific structural embodiment la 
advanced aa a crlterial property diartingulahlng rate- independent rules from 
rate-dependent laws (Pattee, 1973; Yatea, 1980). It is not clear that the 
occurrencea that dynamical topology attempta to portray, such aa blfurcatlona, 
have a atruotural' embodiment; they do not appear to be asaoclated with aymbol 
vehlclea, to uae Pittee's terminology. Even granting that the aingularities 
of a dynamical topology might produce embodiments , there would remain 
unanawered the queation of the origin of the privileged status of the genetic 
code aa a auppreaaor of some aelect, dynamical degreea of freedom. 

The JMA atrlng la the moat carefully atudled example of the discrete mode 
of deacription in a natural context that we can currently lay out hands on. 
It is illuminating in thia reapectt In natural systems a discrete description 
can be neither created nor Interpreted by the dlacrete mode . The atrong 
Implication la that the dlacrete mode of symbolic description that £i 
characteriaic of automata modela of intelligence la insufficient for the task 
of capturing natural Intelligenoe. The dynamical mode miasing from a putative 
computer yrfimulatlon of intelligence la to be found only in the waiting of the 
computer programa and in the reading of the computer outputs. 

What kind of machine, therefore, la more apt for the taak of simulating 
Intelligent activity? One answer would be a machine that execut\0a in two 
complementary modes—the dynamical and the discrete (see Section 4). It would 
be a mlatake to aaaume that a more accurate simulation of intelligent activity 
can be achieved by automata that perform parallel rather than aequentlal 
computationa if by "parallel" la meant dlacrete operatlona occurring concur- 
rently. Elaborating the dlacrete mode of functioning will be of little avail 
in the abaence of complefKentary, continuous dynamical proceaaes (Pattee, 
1974). It would, of courae, enhance t^e computer aa an extenalon of human 
capabilltlea, but that la a very different matter. 
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3 . 3 Self*>c6«pleilng Is Kot^Posaible in the Discrete Mode 

As nQt9d above, one perspective on the origin of discrete elements is 
that singularities emerge from extensive changes in an* underlying continuum. 
Aa long as there is a continuous dynamical process and the possibility of 
variation in the magnitude of certain dimensions, then new (in the sense of 
qualitatively different) discrete events (or observables or descriptions) can 
emerge. The evolution of structure that is incident to scale changes in one 
or more variables affecting an open system is becoming increasingly better 
understood (e.g., Prigogine, 1980; Soodak 4'lberall, 1978). For the present, 
we wish to recognise that where continuous dynamical processes are artificial- 
ly suppressed, as in formal automata theory or in a computer model of some 
aspect of cognition, the intrinsic generation of new primitives is precluded. 
A system executing solely in the discrete mode cannot self-complex. The 
general argument is that any system whose present competence is defined by a 
logic of a certain representational power cannot progress through operations 
in the discrete mode to a higher degree of competence *(e.g. , Podor, 1975)* 

Suppose the operations in the discrete mode are the projection and^ 
evaluation of hypotheses. An hypothesis is a logical formula, as is the 
evidence for its evaluation, and both formulae must be expressed in the 
4iscrete symbols af the system's internal language. If the evidence is 
sufficient to confirm the projected hypothesis, then the fact to which the 
hypothesis corresponds can be registered in the representational medium* 
Importantly, however, the range of hypotheses projected and the range of 
evidence considered ^are both restricted to the expressive range' of the symbols 
available to the system. Any hypothesis or any evidential source that must be 
expressed in symbols other than those available .cannot be entertained. In 
sum, a system executing solely in the discrete mode cannot increase its 
expressive power. It cannot develop tne capacity to replresent more states of> 
affairs at some later date than it can rfe^aresent in the present. What it can 
do is to distinguish, within limits, states of affairs that occur from those 
that do not. The order of complexity achievable by a system executing solely 
in the discrete mode is frozen; it is determined by the order Of complexity 
with which it began. How is the order of complexity raised in a system witii 
no continuous d3rnamical prdcesses, such as a computer: By coupling it to an 
external intelligent device (a programmer) that writes in new symbols and 
discrete rules. 

To summarise, when information used by a system is construed linguisti- 
cally, that is, ignoring the relationship ^between symbols and dynamics, it 
cannot spontaneously increaswe in expressive power. In order to do so, such a 
system w6uld have to be endowed with preadaptive foresight, possessing 
predicates that are currently useless but will be relevant' some day. Since 
this is not possible, computational models are limited to the order of 
complexity with which they began. They cannot outperform the control rules 
that govern their bperatidn (Tomovic, 1978). Natural systems, on the other 
hand,' are open to complexity and require a construal of control information 
that is self-complexing. Usin^ the fixed hardware of computers to explain 
brain function is useless because the computer was designed relative to human 
brains. The symbolic descriptions entailed by the hardware must be tied to 
the dynamics of the human iiser. 
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3.4 An Artificial History la Ho Substitute for a Natural History 

For artificial systems, algorithms and data are needed in order to 
provide an artificial hlatory for a device that has no history in a natural 
environRtnt (Shav A T6dd, 1980). In other words, there is not a natural 
ralationshlp batmen a computer and an environment, so a relation (in the form 
of programs) must be imposed. Animals, however, do have ^ natural and 
mutually constraining relation with their oifvironments by virtue of ontogeny 
and phylogeny» and dynamical laws. They do^not need to embody Icnowledge about 
that relation explicitly; the mutuality is a fact of animal-environment 
systems (for a discussion of animal-environment mutuality, see Gibson, 1979; 
Michaels A Carello, 1961; Shaw A Turvey, 1961). 

3.5 The Specification of Representations (and 'Computational Procedures) Is 
Unprincipled 



A representation may be defined as an abstract or concrete structure 
whose properties symbolise the properties of some other structure by means of 
a relation. As adumbrated in 3.2, a discrete, alternative description of some 
complex process is distinguished, in part, by its limited detail with respect 
to the detail of the process that it represents. Presumably, wherever 
represeiftations are realised, it is of little practical utility to represent a 
thing in other than ;reduced' form. Two closely related questions should be 
raised: (i) on what grounds and by what means does a particular representa- 
tion ^t created rather than another, symbolizing a particular set of 
propei^ties rather than another: and (ii) what determines how much detail a 
representation should include given that it does not equal the detail of the 
reference object? A theory of cognition that abides by the 
representational/computational point of view must give a principled basis for 
answering these two queries. No such principled basis has yet been advanced 
and it is not^Iikely to be forthcoming. 

Let us look at the two questions from the perspectives of physics, the 
perceiving organism, and the scientist seeking a computer simulation of visual 
perception^ In physics, the two questions press the need for a more profound 
understanding of dynamics, ffhe second question requires (among other things) 
an account of how simplicity grows spontaneously from complexity, where 
complexity , is eqjiated with the number of degrees of freedom that can be 
followed in detail in a dynamical description, and simplicity is equated with 
the degrees of freedom remaining in the alternative description, given the 
equation of constraint. As already noted, there are encouragingj signs that 
this account can be given in the coupling of statistical mechanics and 
nonequilibrium thermodynamics (Horowitz, 1968, 1978; Prigogine, 1980; Soodak 4 
Iberallj 1978). However, understanding how some detail is lost and, thus, how 
structure can emerge from less structure, or even homogeneity, is not 
sufficient. Together the two questions require not just an explanation of how 
some detail is lost but an explanation of how that loss is special: A 
continuous dynamical process and its boundary conditions specify an alterna- 
tive description that is privileged with respect to the dynamical processes 
that it constrains. Physics has no choice but to try to understand an 
alternative description (a representation) as an £ posteriori fact of dynami- 
cal processes. It requires a theory of specification, of how a particular 
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conjunction of dynulcal processes an4 botmdary conditions specifies a partic- 
> ular non<*holononic constraint* Again, the recent attempts of Thorn (1973) to 
derive biological organisation from the qualitative properties of a dynamical 
topology may prove helpful in this regard, as might the work of Haken (1977) 
and others to track mathematifoally a system's competing nonlinear modes. The 
point is that physics must pursue a principled account of the specification of 
alternative descriptions. A similar pursuit, however, does not characterize 
the representational/computational approach to cognition. 

Indirect realism (the philosophical orientation of cognitive science) 
supposes that the ability of an organism to perceive significant aspects of 
its environment rests on the ability of the organism to represent those 
aspects internally. To perceive a thing x that is a token of type X will 
involve a set of descriptors proprietary to X in this sense: They are 
necessary and sufficient to distinguish X from other types and they are near 
optimal for distinguishing among X's tokens. Further, given the standard 
construal, to perceive a token of X requires that the proximal data cast in 
the internal vocabulary of sensory transducers be recast in the internal 
» vocabulary appropriate to type X. The outputs of transducers are noncommittal 
on the type X of which x is a token. It is this fact that engenders a well- 
motivated reservation among orthodox perceptual theorists (e.g., Gregory, 
1970) about feature detectors and the like. Admitting to the significance of 
the discovery for understanding perception, they point to the non-trivial 
problem that the same featural data can mean any of several alternative 
things. 

There are two implicit acts of specification in the preceding, neither of 
which is addressed satisfactorily, if at all, by the representational/ 
computational view of mind: (i) the conditions that point to a particular 
descriptor set as proprietary to X; and (ii) the means by which non-committal 
outputs from transducers or feature detectors point to X ' s descriptors as 
being the ones appropriate for describing the current proximal stimulation. 
One might say that both of these are simply matters of induction. But the 
problem of induction (Goodman, 1963) — here, the problem of why some represen- 
tations or why some descriptor set £hould be "projected" rather than others — 
is resolved, it would seem, only by assuming a non-inductive act of (osten-^ 
sive) specification. In the spirit of the Gestalt proposal of a Law of 
PrSgnflns, some scholars posit a benchmark, a simplicity metric, that weeds 
out Ji priori the unacceptable projec^tions from the acceptable projections 
(e.g., Fodor, 1975; Hochberg, 1978). To avoid a vicious regress, the origin 
of this metric must be outside the purview of nondemonstrative inference. 

Turning to the seeing machines of Artificial Intelligence, it is tempting 
to regard some of them as fulfilling what might be taken conventionally as the 
criterion for a successful simulation of perception (e.g., Harr & Nishihara, 
1978). They begin with the description of the retinal mosaic produced by a 
thing X and they end with a description of x in the vocabulary appropriate to 
its type. Such simulations, however, are with respect to things of a single 
type and the problem of which descriptor set to use never arises. The builder 
of a machine designed to see things of type X addresses only the question of 
how the transducer output from a given x can be reliably transcribed into the 
descriptor set S and how a depcription of x (as the stimulus) in terms of S 
can be reliably matched to tokens of X in memory, also describeclj in terms of 
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S. The detenlnatlon of the proprietary set of descriptors S is, of course, 
an intellectual achievenent of the scientist who programmed the seeing 
saohine. Jo account of how the proprietary set S might arise without 
intellectual intervention is attempted. Admittedly, the giving of such an 
account is difficult and perhaps beyond the scope of currelit science. 
Hevertheless, a general theory of specification is logically priot to and 
perhapa incluaive of a general theory of representation (Shaw, Turvey, A Mace, 
19B2; Turvey, Shaw, Heed, A Mace, 1981): attempts to build the latter in an 
unprincipled fashion (ignoring specification) seem misguided. 

3. 6 Hatural Cognitive Sys.tema Are Hon-determinate, Which Is £ Property That 
Discrete Automata Do Not Have i 

Proponents of the computational point of view no doubt would agree that 
idiere physical principles can account for a phenomenon, they should be allowed 
to do 80. But they would also contend that where physical principles fail, 
special, extra-physical principles (i.e., not contained within physics but 
compatible with the laws there identified) must be brought to bear. These 
special principlea must be called upon to explain cognitive phenomena- with, 
presumably, the privileged vocabulary of re presentation/ computation. 

lyiyshyn (I960), for example, offers "cognitive penetrability" as the 
criterion ' for seeking extra-physical explanations. As interpreted pby 
Kugler, Turvey, and Shaw (1982), the underlying necessary condition for 
cognitive penetrability "is that the behavior of the system in question is 
non- determinate, that is, not dominated by boundary and initial 
conditions." If this reading is correct, then a puzzle arises for those 
wishing to explain such behavior on the basis of formal symbol-manipulating 
machines; Linear and computational devices are detaliflinate ; the output 10 
completely specified by the initial conditions (input) and boundary conditions 
(algorithms and representations). Where is the nondeterminacy that is Sup- 
posed to characterize cognition? 

Moreover^ even the condition of nondeterminate behavior does not seem to 
demand the privileged cognitive vocabulary. Dissipative structures (Prigo- 
glne, 1980) are physical sj^tems wherein nonlinear components constrain fluxes 
of energy such that the system's behavior resists, t^ithin limits, the initial 
and boundary conditions to which it is subjected. More generally, living 
things as members of the class of open systems exhibit, to varying degrees, 
freedom from initial and boundary conditions suggesting that non-determinate 
systems rather than determinate should be the source of metaphors for 
cognition. 

4. ALTERHATIVES TO THE COMPUTER METAPHOR 

The relationship between computer science and the behavioral and brain 
sciences has been a syinbiotic one in which each domain effectively raided the 
other for explanatory concepts;. But a denial of the exclusive use of the 
computer metaphor demands a new direction for cognitive science. If not in 
computer science, then where are the model constructs for understanding 
cognition to be found? 
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Two •Itematives will be presented. Ttiej are alike iiwthat both try, as 
much as possible, to explain cognitive capabilities without reference to 
"special" (in the sease of extraphysical) entities. Both are moves away from 
the notion that human and aniaal intellectual abilities require uncommon 
explanations. The shajred strategy is a simple one: Discrete symbol strings 
(e.g.,- representations, propositions, rules) are not to be offered as knee- 
jerk explanations of those coordinations of organism and niche that constitute 
the phenomena of knowing. The processing of symbol strings need not be 
considered as an explanation of cognitive phenomena where physics will 
suffice. The two alternatives we will describe differ with regard to the 
point at which, or whether or not, symbol-string processing has to be 
introduced. ' In other words, that a good many "privileged" cognitive abilities 
are more simply understood in terms of underlying physical principles rather 
than in terms of processing symbol strings is not questioned; whether or not 
all cognitive abilities are to be understood only in terms of physical 
principles distinguishes the alternatives described here. 

What may be considered the less extreme approach takes issue with the 
emphasis of standard theories on the discrete mode to the neglect of the 
dynamical mode of a system's behavior in trying to understand that system's 
intelligence. Rather, our first alternative is an argument (anticipated in 
Section 3.2) that neither mode alone is sufficient. Intelligence is only to 
be understood as a coordination of discrete symbols and continuous processes 
with explicit recognition of their incompatibility (Pattee, 1974, 1977, 1982). 

The more extreme approach is motivated in part by a reluctance to embrace 
notions that are consonant with the dualism of mind and body, a dressed down 
version of animal-environment dualism (Michaels & Carello, 1981; Turvey & 
Shaw, 1979). On this account, the notion of discrete, symbol manipulation and 
continuous dynamics as formally incompatible, complementary processes is 
unsatisfactory: Symbol-matter dualism (Pattee, 1971) is not only continuous 
with the classical dualisms, but it is those dualisms in their most unadorned 
form. But if the complementarity strategy were to be denied, what would 
remain? Quite simply (sic), it would be the strategy of elaborating continu- 
ous dynamics. By this dynamical, strategy, the So-called discrete^ mode would 
be relieved of ail explanatory role and'relegated to the status of just one way 
(out of several or many ways) that a complex system might behave. 

The more extreme approach is motivated further (and relatedly) by a 
concern that indulging the Complementarity Approach may not be in the best 
long-term interests of science. Literally interpreted, the complementarity 
claim holds the discrete, symbolic mode— £ua control information and £iia 
information acquired by measurement (Pattee, 1973)— distinct from physics. 
This is partly in response to a strategy wherein many physicists have pursued 
a view of "information" as just another physical variable, like energy or 
matter (e.g., Layeer, 1975; Tribus « Mclrvine, 1971). The objection is that 
equating "information" with negative entropy or a measure of objective order 
fails to capture the role that "information" plays in explanations of 
biological and psychological phenomena. To the criticism that the orthodox 
physical interpretation of information is too narrow, the Complementarity 
Approach (literally interpreted) adds the criticism that it is a category 
mistake (Ryle, 1949): "Information in biological and psychological contexts is 
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aot reducible to phyaica. In short j< infomation requires a proprietary, 
extra physical explanation* 

Puttee has persistently prodded the scientific comsiunity to consider 
seriously infomation* s ontologioal status* His impression is that definitive 
arguaents In favor of or against inforaation as a physic^ variable cannot be 
constructed because such arguments depend on clear and agreed upon conceptions 
of control and Beasureaent that currently elude us (Pattee, 1979)* The terms 
**control** and ^^measurement** pick out tvo relations between dynamics (a rate- 
dependent proceas) and Information (a rate-Independent process) and they 
identify tuo, as yet unresolved, epistemological IdBues. Coming to grips with 
the concept of information, therefore, is not Just a matter of more physics. 
In the meantime a variety of considerations give the nod to complementarity 
and not to physical reduction (Pattee, 1979f 1982; Yates, 1980). 
Complementarity is advanced as ja principle that calls for simultaneous use of 
formally incompatible descriptive modes in the explanation of natural 
phenomena . Rather than attempting to dissolve the dualisms (symbol/matterT 
mind/body, subject/object^ etc.) the advocated strategy is Jto accept them as 
fact. 

Unfortunately, an endorsement of information and dynamics as complementa- 
ry raises the spectre of a scientifically intractable problem, viz., the 
origin of information, and it is this spectre that the more extreme approach 
wishes to avoid. The detour can take only one direction— that of elaborating 
dynamics. It cannot, however, skirt the epistemological terrain carefully 
mapped out by Rittee* We are sure that Iberall (1977) can be counted among 
those pursuing a dynamical route to information and we suspect that it is the 
route most consistent with the goals of the ecological approach to knowing 
that was conceived and developed by Gibson (1979)* 

Bach of these approaches — the Complementarity Approach and the I^srnamical 
Approach—will be discussed In more detail in the next four subsections* 
While we will align ourselves with the Qynamical Approach, we nonetheless note 
a certain kinship with the Complementarity Approach to the extent that both 
orientations share misgivings about the Discrete Mode Approach that dominates 
cognitive science. * 

4* 1 The Complementarity Approach 

We have identified two modes of system functioning where the discrete 
mode is characterised as rate-independent operations on a finite set of 
symbols and the continuous mode refers to the rate-dependent interplay df 
dynamical processes* What would it mean to understand cognitive abilities as 
a coordination of these two modes? One strategy is to look at actual living 
systems to see how they use symbol strings and dynamics. Beginning at the * 
cellular level, for example, and up through the evolutionary scale, how do 
strings and dynamics coevolve? Are there universals of string/ dynamics 
interactions that might be appropriate to an understanding of the cognitive 
functioning of living systems (Pattee, personal communication)? Consonant 
with this strategy, let us return tb the problem of encyme folding (see 
Section 3*2) for an examination of the complementarity of the two modes* 
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Recall that this particular exaaple conelate of two qualitatively differ- 
ent phasea: the genetic code ayntheaisea an anino acid string that then folds 
Inio a functioning en^yae* The tranalation of the MA symbols into amino acid 
strings is a discrete aymbollc process, vhile the folding of the one- 
dimensional amino acid string into a three-dimensional machine is a continuous 
dynamical process. The former is a constraint on the latter. To describe the 
relation as one of constraint is an important step, for it suggests that the 
system's meaning — its dynamic ability — does not merely reduce to a symbolic 
representation. The symbolic mode harnesses the forces responsible "^for the 
function but the symbolic mode is hot equated with the function. But neither 
is the dynamic mode completely autonomous. The folding of the enzyme cannot 
proceed until the code provides the necessary constraint. In other words, 
neither mode alone is sufficient for the activity in question. 

The effort to ground cognitive abilities in the complementarity of the 
discrete and dynamic modes is a significant departure from standard computa- 
tional/representational approaches. The significance lies in the observation 
that the discrete symbolic mode — the "information" processing—is kept to a 
minimum in natural systems (Pattee, personal communication). Information 
construed linguistically does not provide all of the details for a given 
action; it acts as a constraint on natural lav so that the dynamic details 
take care of themselves. In other words, most of the complex behavior of 
living systems is essentially self-assembly, which is "set up" by symbol 
strings but not explicitly controlled by them. This should he no less true of 
the cognitive activity of biological systems. Complete comprehension cannot 
be had by appealing to symbol-string processing or-^physics alone. Both must 
be used together but in a special way: Use physics cleverly so that symbol 
strings need only be used sparingly in order to assure the parsimony of the 
explanation. v 

The failings itemised in Section 3 with regard to the computer metaphor 
are addressed by the Complementarity Approach as follows: (i) By looking at 
the coevolution of symbols and dynamics, this approach necessarily and 
pointedly incorporates the constraints that a system's physical biology places 
on its behavior; (ii) In the assertion that neither mode alone is sufficient, 
the dynamic mode is granted equal footing with the symbolic mode in embodying 
a system's intelligent activity; (lii) By acknowledging that natural systems 
jlo not execute solely in the discrete mode, the Complementarity Approach can, 
in principle, account for self-complexing where new primitives emerge from the 
underlying dynamics; (iv) The coevolution of symbol strings and dynamics 
obviates the need for a system's history to be carried, in cumbersome detail, 
by the symbolic mode and suggests, instead, that the natural history is 
captured in the compl^entarity relationship; (v) Two principles, parsimony 
and minimal information, are offered as guidelines for the introduction of the 
detail to be carried by a symbol string; (vl) The dynamic self-assembly of 
natural systems, of which cognitive systems are an example^ is constrained but 
not determined by the symbolic mode. 

4. 2 The Dynamical Approach and Ecological Realism I 

In Section 2.0 we suggested that it was the framework of indirect realism 
that made the computer metaphor alluring to the behavioral and brain sciences. 
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A fraMevork of direct or ecological realiest honeverf will not share the same 
syapathiea. Indeed, direct or ecological realism, as promoted by Gibson 
(1979) and others (e.g., Michaels A Carello, 1981; Turvey et al., 1981), 
disallovs many of the constructs that are part and parcel of a re presentation- 
al/ ccmputational orientation and daiiands a very different class of machine in 
order to model cognitive activity* 

Consider the folloving comments of Gibson in reference to orthodox 
approaches to perception: 



Adherents to the^ traditional theories of perception have recently 
been making the claim that what they assume is the processing of 
information in a modem sense of the term, not sensations, and that 
therefore they are not bound by the traditional theories of percep- 
tion. But it seems to me that all they are doing is climbing on the 
latest^ bandwagon, the computer bandwagon, without reappraising the 
traditional assumption that perceiving is the processing of inputs. 
I refuse to let them pre-empt the term information . As I use the 
term, it is not something that has to be processed. (Gibson, 1979f 
p. -251) 

Hot even the current theory that the inputs of the sensory channels 
are subject to "cognitive processing" will do. The inputs are 
described in terms of information theory, but the processes are 
described in terms of old-fashioned mental acts: recognition, 
interpretation, inference, concepts, ideas, and storage and retriev- 
al of ideas* These are still the operations of the mind upon the 
deliverances of the senses, and there are too many perplexities 
entailed in this theory* It will not do, and the approach should be 
abandoned. (Gibson, 1979, p. 238) ^ 



V 

The gist of those quotations is plain: Perceiving does not involve 
cognitive intermediaries; It does not involve the making of representations or 
the evaluating of propositions* The central and fundamental role of explicit 
symbol-manipulating processes in the orthodox treatment of perception is 
repudiated by Gibson* For Gibson, information in the case of vision is 
optical structure that is lawfully generated by environmental structure (e.g., 
the layout of surfaces) and by movements of the animalT^both movements of the 
limbs relative to the body and movements of the ♦bdy relative to the 
environment)* This optical structure is not similar to its sources, but it is 
specific to them in the sense of being nomically dependent on them. For 
Gibson these nomic dependencies comprise an important subset of the laws at 
the ecological scale that make possible the control of activity* 

By the 'perceiving of a thing x* Gibson means something very particular, 
namely, that (l) there is Information about the thing x in the sense of 
specific to the thing x; and (2) the information about the thing x is picked 
up,. or detected, by the organism (see Turvey et al., 1981, for a more detailed 
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diacuaslon) • Lt JLs becauae of tha apaciflclty of Information identified in 
(1 ) that the fulflllnent of T^lHdoaa pot involve interpretive » elaboratlvey 
reatoratlva, conatrttctive, atc>, operations * Conaiderable confusion aurrounde 
thia aaaartion^ A couon Blareadlng la that it denies the organiam (or its 
cantral narvoua ayataa) any aubatantive role in perceiving. In truth, what 
tha aaaartion daaiea la tha orthodox interpretation of that role. Information 
in Gibaon*8 aanaa doaa not require proceaaing (by eplstemlcally laden opera- 
tiona) but ita pick up does involve processes . Gibson (1966, 1967) gives 
hints that theae proceaaes are closer to the processes identified by physics 
and qratema theory than to the prpceases commonly identified by neuroscience , 
paychology and computational a^ence . Thus he refers, informally, to 
• peaonating, * ' optimiaing, ' ' oymmewicaliBing, ' ' equilibrating, ' ' orienting, ' 
* adjuati^n^, * and the like. ' 

Although one could read the foregoing terms as labels for happenings in 
the brain , Gibson resists this move. He ascribes these terms to the states of 
a perceptual system, where a ^perceptual system is defined by an organ and its 
adjuatmenta at a given level of functioning, and where incoming and outgoing 
fibera compriae a continuous loop (Gibson, 1966, 1979). And he intimates that 
the statea to which at least some of these terms refer may well be distributed 
over the organiam and its environment: Do a perceptual system and the 
information that it picks up comprise a unitary system that 'equilibrates'? 

The computer provides a metaphor for the processing of information in the 
orthodox treatment of perceiving, but what kind of machine could provide a 
metaphor for the pick up of information in Gibson's heterodox treatment of 
perceiving? Ve do not believe any sujih machine currently exists. 
Nevertheless some steps can be taken toward its definition. ^ 

To begin with it seems that the machine in question must be of the 
dynamic aort (governed ,by law) rather than of the symbolic sort (governed by 
rule). Second, it seems that the machine in question must be an ensemble of 
special purpose dynamical responses to specific dynamical challenges. 
Gibaon'a construal of information implies that there are properties of ambient 
energy distributions that are unique and specific to behaviorally related 
properties of the environment and to , the organism's relationship to the 
environmeait (e.g., moving forward rectilinearly, turning, etc.). These ambi- 
ent energy properties are not replaceable by (putatively) more elemental 
properties. It has been suggested that if the pick up of an ambi^ent energy 
property of the kind envisaged by Gibson (^also see Lee^ 1980, for an 
eatabliahed instance) doea not, therefore, involve a preliminary decomposition 
into more molecular properties (followed by a knowledge-guided inferefnce or 
synthesis), then that piok up must be achieved by a device tailored to the 
property (Huneson, 1977). The notion of an ensemble of special purpose 
dynamical solutions raiaes queationa^of the physics that molds them and the 
phj^sics that relates them. Answers are beginning to take shape (e.g., 
Iberall, 1977, 1978-a, IST/B-b) and will be required if the machine in question 
is ts](J^materialiee. 

A more diaquieting question is raiaed by the simple recognition that for 
a dynamical machine to suffice as a metaphor it would have to be 
ayatematically affected by ita challenges. It would have to have a history. 
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Glbebn (1966, 1979) speake of perceptual systeas being "attuned" to 
infbniation in the two senses of (i) becoming able to detect a particular 
Infomation kind;, and (ii) beconing better at detecting a particular 
information kind. The disquieting question is how a machine governed solely 
and strictly by dynamical laws can have a history given that dynamical lawsi 
Bvm ahistorical. On this question it would appear that the I^amical Approach' 
must give way to the Complementarity Approach, dynamical history in the 
Compl«ientarity Approach has a placeholder — the discrete, symbolic mode— but 
what and where is dynamical history's placeholder in the I^amical Approach? 

In the next section we take .a look at potential machines as examples of 
dynamical machines that are necessarily soecial purpose; in section 4.4 we 
elaborate the question of history in dynamip^s and express some thoughts on how 
it might be addressed. 

^•5 The Dynamical Approach and Potential Machines * 

It is ironic that A. M. Turing, who is unsurpassed in his contributions, 
both to the concept of discrete automata and to the computer metaphor for 
intelligent activity, should have made a seminal contribution to the explicit 
understanding of potential machines (Turing, 1952). Indeed, one might regard 
the Dynamical Approach as a (fell to rally behind the later (1952) rather than 
the earlier 0950) Turing TanJythe Complementarity Approach as a call to rally 
behind both TuringsJ ^ 

What Is a potential machine? It is any system in which "potentials" 
(i^ughly, energy reservoirs) are available for the play of the system's 
trajectories in state space (or mathematical domain). The "themes" from which 
the system* s trajectories are fashioned include attractors, basins, and 
separatrices. These themes emerge and dissolve as a function of changOT in 
.the layout of potentials. This layout of potentials plays (implicitly) the 
same organising role as the governing dynamic equation set plays (explicitly) 
in the digital computer. 

The governing logic for a potential machine braids topological properties 
with physical laws (e-g-f conservation principles). The end-product is a 
geometro- dynamic logic that generically couples physics to geometry (Abraham & 
Shaw, 1982; Thom, 1975). The geometro-dynamic logic is universal for poten- 
tial fields; that is, the design logic is independent of the material 
composition* Because of the generalisable nature of dynamic patterns, it is 
possible to use the layouts of attractors, basins, and separatrices of one 
material substance to study the dynamic properties of a materially different 
system with the same or similar layouts. In other words, a substitute 
geometro-dynamic field can be used to study the unfolding (or evolution) of 
trajectories for a wide class of dynamic systems (many of which defy direct 



photo-elastic machine (Frocht, 1941); (ii) the Hele-Shaw parallel-plate ma- 
chine (Lamb, 1932); (iii) the Chladni -Faraday vibrating machine (Faraday, 
183t; Waller, 1961); (iv) the Rafyleigh-Bernard simmering machine ( Penstermach- 
er, Swinney, Benson, A Golub, 1979); (v) the Covet te-Taylor stirring machine 
(Koschmeider, 1977). An example of a potential machine in biology is th^ 
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pleso-electrlc effect la bone grovth— a transduction of mechanical stress 
patterns to electric voltages to bone groirth. ^ % 

Bach of the above is a physical saohine that Simulates the behavior of 
soae system without any symbolic representation of that behavior. The 
simulations or "solutions'* are not the result of formalisms entailing some 
form of recursive ^function theory but rather are the result V>f equilibrations 
occurring vithin competing processes of energy flow systems. For these 
machines, the field "solves" Its own self-defining equation sets. Whereas 
dynamic modeling with a digital computer* may provide accounts of single 
trajectory solutlonst it does not provide accounts of the continuum field 
properties. This limitation Is the reciprocal of that of potential machines; 
that Is, a potential machine can exhibit properties of a continuum field 
nature but It cannot Isolate a single trajectory solution nor precisely 
Identify the Initial conditions of an equation set. We briefly describe two 
potential machines and an unsuccessful programmatic attempt to build general 
purpose potential machines. ^ 

4,3,1 Photo-elastlclty ; A photo-elastic analQgue for solving problems 
in field mechanics (Procht, 1941 ; Lovsi 1944; Sommerfeld, 1934 )> The theoret- 
ical similarity between field problems in Hamlltonlan ray mechanics and 
Newtonian particle mechanics can be experimentally realised using photo- 
elastic components to model tlje field dynamics of stress properties in 
mechanical systems. The photo-elastic field's similarity In character to the 
Hamlltonlan ray mechanics field properties allows for its use as a dynamic 
simulator for problems in Newtonian continuum mechanical problems. In this 
sense 9 an electro-magnetic field can be used to generate solutions to problems 
involving a continuum mechanical field. Analogue machines can be designed 
that Almulate or "model" the stress fields arising in continuum mechanical 
fields. There is reciprocity In simulation allowing for the Inverse possibil- 
ity of a continuum mechanical field to be used to "model" or "simulate" an 
electr9-magnetic field. The photo-electric simulator Involves a piece of 
stressed plastic through which a polarized light field is passed. The index 
of refraction generates a patterned field of stress contours that is propor- 
tionally similar to the stress contours^of a related' mechanical field. These 
simulations are not analytic. Rather they are dynamic simulations involving 
no explicit processing of symbol strings. The problems are solved dynamically 
within the field; that is, the system's trajectories are powered by the 
available potentials and constrained by their geometrical layout in accordance 
with the conservation principles. As long as potentials provide a source of 
energy to the system, equilibrating trajectories will be defined. 

4.3-2 Hydrodynamics ; The Hele-Shaw simulator . The Hele-Shaw simulator 
(Lamb, 1932; Shaw, 1980) was designed to solve a limited set of problems in 
fluid mechanics. The machine is a hydrodynamlc device in ^whlch a two- 
dimensional liquid flow is established between close parallel plates. Various 
obstacles can be Inserted into the flow stream so as to create new source/sink 
layouts associated with consequent changes in the -field's kinetic patterns. 
For the niost part, those results could be generalised to any two-dimenslona,l 
flow field whose structure was constrained within the laminar domain. 
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in attempt at m general purpoee electro-dynanilc computer t The 
ttuteiaakher enterprlee * iffgital and potential machines dietingulah on the 
laaue of aelf-organlsatlon: Potential nacMnes self-organise ; digital ma- 
chlnee (aa yet) do not.^ A digital aachine'e aet of trajectories (output state 
epace) ie foxmally closed and explicitly reatricted by limits defined in the 
equation aet. A potential aachine's aet of trajectories ie open and can 
evolve aa a function of ranges and domi^ns of accessibility for the operation- 
al paraaetera. Vhereaa the digital machine is a general-purpose device that 
can be designed to instantiate an indefinitely large number of rules » a 
potential machine is a speclal-pi^rpoae device that is successful in special- 
ised circupstancea bjd^virtue of a particular geometry linked to a particular 
subset of physical laws. This restriction on potential machines has severely 
limited its applicaliility as a general purpose computer. Gutenmalcher (1963) 
details the most extensive programmatic attempt at using a potential machine 
as a general-purpose computing machine* The Gutenmakher laboratory was 
Bu8sia*8 brain- trust to compete with the digital computer evolution/ in the 
• West. The Bussiana sought an "electro- logical, chemico-logical, mechanico- 
logical device" In the belief that it would prove to be a more general purpose 
(and powerful) device than the discrete automaton* Their attempt failed for 
two major reasons: (i) it was premature, and (ii) dynamic logic is necessari- 
ly apecial-purpoBe, unlike digital logic, which can be general purpose. The 
, machine pursued by Gutenmakher could solve classes of problems untouchable by 
the digital machine; the economic needs^ however, were for a general-purpose 
device. (in part, the failure of the Gutenmakher project accounts for the 
present inferiority of Russian computer technology.) 

4.4 The Dynamical Approach ; Duality Rather Than Cbmplement>6rity ? 

Although the potential machine is the model that seems better suited to 
the framework of ecological realism, we can identify two related problems that 
must be reeolved In order for such a machine to be minimally adequate to model 
cognitive phenomena: (I ) complfflaentarity is continued, and (2) time (and, 
therefore, hlatory) plays no role in dynamical law. In this section, these 
problems are identified and a framework in which the resolution might be found 
la sketched* 

Hie two types of machipea — the potential and the symbol manipulating — can 
be diatingulahed aa law-governed and rule-governed , respectively. In the 
language of the Complementarity Approach, these would correspond to the 
dynamical and esrmbolic modea. With regard to problem (1), then, the two 
claaaea of machines continue the distinction between the two modes and enforce 
the distinction between those aspects of phenomena each can be said to 
deacrlbe: Phenomena per se, in uninterpreted 'form, favor the common bases 
eatablished in potential machines, while formal simulations of phenomena faV^dr 
the repreaentative forms provided in symbol- manipulating machines. We have 
not yet resolved, therefore, the paradoxical relationship described by Pattee 
(1982): 

Complementarity is not to be confused with tolerance of different 
views. It Is not~^firreeoiutlon of a contradiction, as 1^' you were xo 
agree that we are simply "looking at the problem from different 
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perspaotlvea**. ••Bather, It la a aharpenlng of the paradox^ Both 
nodes of deacriptlon, though formally Incompatible t miist be a part 
of the theory, and the truth is diecpvered by studying ^he interplay 
^ of the oppoaltea (p^ 27-28) • 

Potential machlnea and eymbol -manipulating machinea are considered oppo- 
eites insofar as the former are law-governed and the latter are rule-governed. 
But is Ihis itself the criterial distinction or does it merely create thot 
critical property by which the two classes of machines are necessarily 
distinguished? If the latter, what might this property be? One important 
feature of dynamical laws in traditional (Hamiltonian) physics is that time is 
an extrinslcally imposed state label. As a consequence, the future state of 
the system can be predicted only on the basis of current state information and 
the law. In other words, the history of dynamical systems cannot^ be 
reclaimed. 

* — 

With regard to problem (2), then, a potential machine under classical. 
Quantum mechanical, or relatlvistic dynamical law would be a machine whose 
history would play no role in its future, (in contrast, symbol-manipulating 
machines are equipped irlth a history by a program.) There is clearly 
something lacking in potential machines when applied to humans and animals 
with learning histories to guide them. Bertrand Russell (1921 ) suggested that 
the omission is one of mnemic determination — current constraints must be 
augmented by historical cohstrfiints that produce a tendency. But if classical 
laws are not time-bound, how can djmamical models (potential machines) be 
adequate models for psychological (mnemic) i^enomena? The answer depends on 
the possibility of introducing mnemic relations into the laws • that govern 
potential machines. It is^ our contention that this is currently being 
accomplished under the efforts of contemporary physicists such as Prigogine 
(1980;, Iberall and Soodak (1978) and Haken (1977), and others to make*- time an 
intrinsic part of dynamic law ;3uch that history is no longer an alien concept. 

' If this is indeed the case, how afe "opposites" such as mnemic (past 

temporal) constraints and physical (future-pending) constraints to be con- 
strued? Compleinentation enforces dualism, which is not countenanced by 
ecological realism^ Yet these opposites are not simply symmetrical perspec- 
tives^ Rather, we suggest that the relation is one of duality (a mathemati- 
cally defined relation as opposed to dualism, a philosophically defined 
position) wherein there exists a class of potential machines, PH, governed by 
future- poinding laws and a dual class of potential machines, FH' , governed by 
past-dependent laws* Ve can only speculate about the possibility that th^re 
exists a class 6f machines, HI, with a generalised dynamics that incorporates 
FM and Rl' ks coordinated Cdual) submachlnes. (Shaw and Todd [1980] provide a 
formal description of an analogous pair of dual abstract machines.) 

Because the Complementarity Approach finesses many of the failings of the 
computer metaphor simply by acknowledging the role of dynamics in natural 
systems, the solutions from the dynamical approach will not be appreciably 
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Th0 computer metaphor vms criticiaed because there la no principled baaia 
for specifyiiig (i) which repreaentationa are created and (ii) how much detail 
a particular repreaentation should include (aee Section 3«5)« l^o CompXemen- 
ttrity Approach does not addreaa point (i) apecificaily but it does addreaa a 
related , point , naaely, when a rep^Qi^entation ahould be created by putting a 
pre«lu« on parainonious explanationa—if the physics ia getting too complex, a 
ajmbol or ajibol at ring ahould be allowed to reatore aimplicity. And, given 
the conviction that cognitive ayatema ahould be conaonant with other natural 
ayatema, point (ii) is anawered with the atricture that the detail carried by 
a repreaentation ahould be minimal. We are nOit at all convinced, however, 
that auch a tactic oolvea the problem aatisfactorily. It seema to be a tactic 
for the acientiet trying to explain nature rather than a tactic of nature 
itself. 

In denying the equation of 'information with repreaentation and in 
promoting the equation of information with apecif ication, the I^jrnamical 
Approach, tempered by Gibson* a ecological realism, subetitutea the question of 
how representations' are apecif ied by questions of the kind: How is optical 
structure apecif ic to what activity can be done (by an organism of a 
particular type in a particular aetting), how it can be done, and when it can 
be done. For example, how ia optical atructure specific to a place that 
permits atepping down (ratheir than, say, falling off), apeclfic to how the 
atepping down ia to be conducted and apecif ic to when the stepping down ahould 
be initiated* 

Our impresalon is that answering questions of the nomic dependence of 
optical structufe on facta of the animal-environment system will illuminate, 
in a very general way, the specif icational perspective on information emi^a- 
aiaed by the Dynamical Approach. One might aay that, in contraat, the^ 
Complementarity Approach emidiaeiaea an indicatioiial or injunctional perspec- 
tive on information, preaerving the qualitative tenor of formal information 
theory. Hot surpriaingly, Gibaon aees the latter as a miaplaced emphaaia: 



Inhere ia a vast literature nowadaya ^of speculation about the media 
\ of communication. Much of it is undisciplined and vague. The 
concept of information moat of ua- have comes from that 
literature." "...we cannot explain perception in terms of communi- 
cation; it ia quite the other way around. We cannot convey 
information about the world to othera unleaa we have perceived the^ 
world. And t»a available infona^tion for mir perception is 
radically different from the Information we convey . (Gibson, 1979f 
p. 63; author's italics.) 

Tho indicational senee of information ia not excluaiva. It is distinct 
from the spec if icational sense and predicated upon the spec if icational senae. 
In short, understanding information as specific is logically prior to under- 
standing information as indicative (compare with Section 5.5)' Eiplicit 
..ao/««fi< finn nt ^his Priority distinguishes the l ynamidal Approach from the 
Complsaentarity ApproaclJ. 
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PBRCBPPUAL IHTSGBATIQH OP SPECTRAL AHD T^PORAL CUBS FOR STOP CONSONANT 
PUCE OP ARTICULATION: NEW PUZZLES 

Bruno H. Repp « . 



Abstract * A replication of a recent study by Tartter, Kat, and 
Samuel {Note l) was attonpted in four parallel experiments. The 
experiments concerned the way in idiich VC and CV fomant transitions 
are perceptually int^ rated into a single stop percept when they 
occur in a VC-CV utterance, separated by a variable silent closure 
period. Certain aspects of the Tartter et al. data were replicated, 
but there was extreme variability both across different stimulus 
sets and across individual listeners. WhilQ the results disconfirm 
• " earlier findings of complete CV transition dominance, they offer few 
clues as to how listeners derive the phonetic percept from the cues 
in the signal. 

INTRODUCTION 

The perceptual information for stop consonants in intervoccdic posi'tion 
is distributed over time and can be divided into preclosure, closure, and 
postcXosvre cues. . The duration of. the closure provides important information 
about stop manner and voicing, as well as some cues to place of articulation-- 
the feature that the present study .is concerned with. The major cues for 
place of articulation, however, reside in the spectral changes immediately 
precedi^ and following the closure interval, viz., in the preclosure (VC) and 
poatclosure. ( CV) foraant transitions. (An especially important cue, the CV 
release burst,, is generally omitted' from -synthetic stimuli used in perceptual 
studies, and the present experiment follows suit, for better or worse.) Since 
these spectral cues can be integrated into a unitary stop consonant percept 
over closure intervals as long aa 200 msec (Repp, 1978), they represent an 
especially interesting case, for investigating the mechanisms of phonetic 
perception. 

- 

One question concerns the weights given to these temporally separated 
cvea. Is the perceived place of articulation detemined primarily by the VC 
transitions or by the CV transitions? One way to find out is to juxtapose 
^conflicting sets df transitions. A number of experiments have shown that, 
lAien the closure interval is too short to permit perception of two different 
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stop consonants in sequenoe, perception nearly always goes with the CV 
transitions. Fbr exaaple, idien the syllable /ab/ is followed by /da/ after 
only 20 msec of silence, listeners generally report /ada/, rarely /abda/ , and 
never /abV (Abbs, 1971; Doiman, Raphael, 4 Liberaan, 1979; Pujimura, Macchi, 
4 Stroeter, 1978f Hepp, 1978). These findings suggest that the CV transitions 
are a far more powerful cue than the VC transitions. 

However, a recent study by tortte^, Kat, and Samuel (Note l) has 
challenged this conclusion. Their approach differed from that taken in 
previous experiments in that they did not juxtapose conflicting transition 
cues, but instead cliose roughly, compatible VC and CV transitions for their 
stimuli. At first blush, it seems that this procedure could not yield any 
useful information. HDwever, Tartter et al. took advantage of between->8ubject 
variability in the following, rather ingenious way. 
>. 

Diey constructed a syn^thetic CV continuum ranging from /ba/ to /da/ and, 
by simply playing the stimuli backwards, obtained a corresponding VC continuum 
ranging from /ab/ to /ad/. ''Then they concatenated corresponding (mirror- 
Image) stimuli f^om the yC and CV continua with varying silent intervals in 
between, idiich^esulted in several /aba/ to /ada/ continua. The usefulness of 
this paradigm derived from the fact that not only was the average location of 
the /b-d/ category boundary tl if fe rent on the VC and CV continua, but there was 
also considerable individual variability in boundary locations. This enabled 
Tiartter et al . to perfom a correlational analysis , to determine irtiether, on 
the idiolQ, perception of the VC-CV stimuli resembled more that of the VC 
components or that of the CV components in isolation. The results showed, 
surprisingly, that neither VC nor CV perception was a strong predictor of VC- 
CV perception, at any of the different silent intervals. The only significant 
correlations were obtained between CV and VC-CV perception idien the closure 
intervals were very short (O or.25 msec) . This effect was roainiscent of the 
perceptual dominance of the CV .transitions found in earlier studies, although 
it was much weaker. Another noteworthy finding was thiat VC-CV identification^ 
at very short closure durations (O or 25 mfilQc) w^s"^ unrelated to VC-CV 
identification at longer closure durations (50 or 100 msec), wjiich suggested 
that the nature of the perceptual integration of VC and CV cues changed 
between 25 and 50 msec. 

Although Tartter et al. were not able to conclude much more from their 
data than that the perceptual interactions between the different cues were, 
rather complex, their findings are nevertheless intriguing. The absence of 
any strong dominance of the CV transitions suggests that this effect may have 
been an artifact of earlier procedures: The Juxtaposition of strongly 
conflicting VC and CV transitions, and the consequent acoustic and articulato- 
ry discontinuity in the speech signal, may have disrupted the natural process 
of,. perceptual integration and produaed a kind of masking effect (cf. Massaro, 
1975)*. The stimuli of Tartter et al. were more realistic than the earlier 
stimuli in that they contained relatively compatible foment transitions, and 
they may haye permitted perceptual integration of the sort that occurs also in 
the percepti^on of natural speech. Their results, even though they are not 
easy' to interpret, may nevertheless be more "ecologically valid" than the 
earlier, deceptively simple findings of near**total CV transition dominance. 
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The present study is a replication and extension of the fortter et 
al. experiment. (Their study also included conditions in which VC or >CV 
stimuli w re fo Honed or preceded by tyansitionless vonels) these conditions 
Mill not be considered here.) A replicatian seemed useful not only because of 
the com pi ex ity of their results but also because of two apparent methodologi- 
cal weaknesses., Oae concerned their stimulus materials. Their CV series 
constituted the center stimuli of a continuum previously used by Miller 
(1981 but the labeling functions obtained were considerably flatter than 
expected (Miller, Rote 2)/ suggesting a possible loss in quality due to 
multiple dubbing 9 or simply unusually high variability. Also, the VC stimuli 
were rather crude, being merely the mirror images of the CV stimuli. One aim 
or the ' present study was to ^ use improved stimulus materials. , The other 
weakness of the Tartter et al. study was that the authors permitted only "b" 
and "d^ responses to VC-CV stimuli-- Earlier studies suggest that, at the 
longest closure interval used (100 msec), and perhaps also at the shorter 
onesr subjects may occasionally have heard sequences of two different stops 
("bd" or "db") but were not able to report them. In the present study, 
therefore, all four types of responses were pennitted. 

The presen.t experiment extended the Tartter et al, study in two ways. 
First, two parallel sets of synthe.tic stimuli were employed. One of them, was 
modeled after natural speech and, therefore, was slightly more realistic than 
the Tartter et al . stimuli. In that set, the VC and CV transitions were not 
mirror images of each other. However, to replicate the Tartter et 
al. procedures more closely, and also to investigate the possible role of 
differences in deta:iled stimulus structure, a second, acoustically different 
set of stimuli was employed in which the VC aAd CV transitions were mirror 
images of each other. The second extension consisted of the use of /d-g/ as 
well as /b-d/ continue. Thus, with two stimulus sets .and two different 
phonetic contrasts, the presenV study provided a strong test of the internal 
consistency of the results. \^ 

METHOD ; 

Subjects ( 

Ten paid student volunteers and the author served as subjects in th& 
first half of the experiment (GC stimulus set). Eight subjects returned for 
the second half (SYM stimulus set); two new volunteers and a research 
assistant also took the itest. The data of all subjects will be reported, for 
listening experience seemed to have no systematic influencle on the responses. 

Stimuli 

The first set of stimuli, called GC (after the speaker from whose 
utterances the synthetic stimuli were derived), has been described' in detail 
in Repp (l^). Hie set .originally comprised 7-member /ab/-/ad/ , /ad/-/ag/, 
/ba/-/da/, and /da/-/ga/ continua. Only five members of each continuum were 
used in the present study (Nos.;i-5 from the /ad/-/ag/ continuum and Nos. 2-6 
from each of the other three continua). All stimuli were generated on the OVE 
IIIc serial resonance synthesizer at Haskins Laboratories. Note that the VC 
and CV stimuli were not mirror images of each other; they differed in fomant 
trajectories, pitch contour, duration, and amplitude. Within each continuum, 
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homver, the atlfflull differed only In the transitions of the second and third 
fomanta* 

%e second aet of atinull, called SYN (for " symmetric" ) , vas also created 
on the OVB IIIo .8ynthe8l8er» but vithout any specific human model. It, too, 
coaprlaed four 3--n«niber contlnua* The VC stimuli vers exact mirror images of 
the CV stimuli* All stimuli were 230 msec long, had 30-msec linear formant 
tranaitiona* 200-fflsec steady* S'tates, and * lln9arly 'cheuoging pitch contours 
(rising in VC and falling in CV stimuli). Ihe steady states of the three 
lovest fomants wore at 771 » 1233, and 2320 He. The teralnal frequency of the 
first foraant ma at 283 He in all stimuli, that of the second formant ranged 
ftrora 1067 to 1425 He in the /b-d/ series and was fixed at 1770 m in the /d-g/ 
series, and that of the third fomant ranged from 2311 to 2670 He in the /b-d/ 
aeries and from 2769 to 2396 H» in the /d-g/ fseries. 

All stimuli were digitieed at 10 kHs and recorded on tape in random 
sequences* For each stimulus set, there were four tapes, two for each 
phonetic contrast. F6r the /b-d/ contrast, for example, the first tape 
contained the 10 individual syllables from the /ab/-/ad/ and /ba/-/da/ 
contlnua, repeated 20 times, irtiile the second tape contained the five pairings 
of corresponding stimuli from the VC and CV contlnua at three different 
closure Intervals (20, 60, and 100 msec), repeated 20 times. Ihe tapes for 
the /d-g/ contrast were similar. Identical random sequences *were used foir the 
GC and SIM tapes. 

Procedure 

!Die subjects listened to the GC and SYH tapes in separate sessions. Ihe 
order of the /b-d/ and /d-g/ tapes within a session was counterbalanced across 
Subjects. The tape with the Isolated syllables was presented before the tape 
containing the corresponding VC-CV stimuli. Ihe subjects were asked to assign 
the consonant in each stimulus to either of the two relevant categories (e*g*^ 
"b" or "d"). The task for the VC-CV tape was to write down all consonants 
heard, choosing from the four relevant possibilities (e.g., "b", "d", "bd", 
"db"). 

RESULTS AHD DISCUSSION 

The stimuli of each continuum were labeled with reasonable consistency. 
To reduce the data to man^eable proportions, average response percentages 
were computed over the five stimuli on each continuum. Ihese average results 
are plotted in Figure 1* E^ch panel shows the^data for Isolated VC and CV 
syllables (on the very right) and three functions representing responses to 
VC-CV stimuli, with closxire djoration on the abscissa. The solid function 
plots the percentage of single-stop responses in the category listed on the 
ordinate, i^ile the two functions labeled VC and CV include, in addition, all 
two-stop responses in which either the VC or the CV portion was assigned to 
the category on the ordinate. Thus, for example, for the /b-d/ contlnua, the 
solid function is based on "b" responses only, the VC function on "b" and "bd" 
responses, and the CV function on "b" and "db" responses. The percentages of 
"bd" and "db" responses may be obtained by subtracting the solid function from 
the VC and CV functions, respectively. The reason for plotting the data in 
this way Is that, if VC and CV perception become increasingly independent as 
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Figure 1: Average response patterns in the four conditions of the experiment. 
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closure duration increases, the VC and CV functions vould be expected to reach 
aaj/mptotea at the response percentages for Isolated VC and CV syllables. A 
Mioiatch indicates that significant perceptual interactions persist at the 
longest closure duration. 

Let us now consider the data in some detail, focusing first on the /b-d/ 
continua (left-hand panels). In the GC set (top panel), the isolated VC 
stimuli elicits considerably more "b" responses than did the isolated CV 
stlJDUli; hovever^ this difference is not interpretable because the VC and CV 
stimuli did not bear any special relation to each other. In the SYH set 
(bottom pinel) y on the other hand, there were someiriiat more '*b'* responses to 
- isolated CV stimuli.. Biis result is contrary to the finding of Ukrtter et al, 
who obtained more "'b" responses to isolated VC stimuli in their symmetric 
stimulus set. !Qie cause for this difference is not known. 

The response functions for the VC-CV stimuli in the /b-d/ series show 
little sensitivity to the closure duration variable. A small nunber of two- 
stop responses emerged at the longer closure dui'ations. In these respects, 
the results for the GC and SYM sets are quite similar. However, they differ 
in the relation of the VC-CV results to the results for isolated monosyll- 
ables. ''FroBi the GC data one WDuld have to conclude that, at the shortest 
closure duration, VC and CV cues contributed about equally to the stop 
percept. At the longest closure duration, the VC function approaches the 
level of isolated VC syllables, but the CV function shows a higher rate of "b" 
responses than isolated CV syllables, indicating that CV perception was not 
independent of the VC context. In the SYM set, on the other hand, the VC-CV 
functions start out at a level that suggests dominance of VC cues. (A similar 
pnttern was obtained by Tiartter et al . but was not Interpreted as dominance 
for reasons mentioned below.) At the longest interval, the VC function is 
close to the level for^lsolated VC syllables, as it was in the GC data, but 
the^ CV function reflects fewer "b" responses than were given to isolated CV 
syllables. 

(ki the basis of these data, it may be argued that, at the longest closure 
duration, the VC transitions exerted an assimilative effect on the perception 
of the CV transitions. In view of the persisting high rate of single-stop 
responses, such an assimilative effect would not be surprising. What Is 
surprising is that^ in this interpretation, the VC transitions emerge as the 
more salient cue. There is certainly no indication of CV dominance in these 
data. 

Consider now the results for the /d^g/ continua (right-hand panels). In 
the GC set, isolated CV stimuli received more "d" responses than isolated VC 
stimuli; again, this difference is not meaningful in itself. There was no 
difference at all in the SYM set. The VC-CV results reveal strild.ng 
divergences. Gtaie feature the, two stimulus sets have in common is a fair 
proportion of two-stop responses at the longer closure durations, "gd" 
responses , being far more frequent than "dg" responses. However, the two 
stimulus sets differ strongly in the effect of closure duration on single-stop 
responses: "d** responses increased with closure duration in the SYM set but 
d^reased in the GC set. If "g** responses had been plotted instead, a 
moderate decrease in the GC set would have contraeted with an extremely 
pronounced decresse in the SYM set. As can be seen in the figure, thie 
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difference is related to the fact that, at the shortest VC-CV closure duration 
in the Sm set. subjects wre much leas likely to report d (and much more 
likely to report "g") than for either stimulus component in isolation. In the 
OC set, on the other hand, no s<«sh tendency was evident; the results at the 
shortest closure duration suggestVCV dominance. The results at the longest 
closure duration are similar for the two stimulus sets (indeed, similar to the 
/b-d/ results) in these respects: The VC function is close to the results for 
Isolated stimuli nhile the CV function is not. The higher rate of d 
responses to CV syllables , in VC-CV context than in isolation suggests a 
contrastive effect of VC cues on CV perception, which is consistent with the 
presence of a fairly large proportion of two-stop responses. Closer examina- 
•tion of "gd" responses, which constituted the large majority of two-stop 
responses for the /d-g/ series, revealed that they derived primarily frcm 
combinations of /g/-like VC and CV transitions. This confinns the greater 
parceptual lability of CV transitions in these stimuli. 

In summary, the data in Figure 1 present a rather confusing picture. The 
subiects' responses to VC-CV stimuli with a very short closure duration 
suggest VC dominance in two conditions, CV dominance in one, and a strong 
nonlinearity in the fourth. Increases in closure duration affected fhe two 
/d-g/ continua in opposite ways and the two /b-d/ continua hardly at all. 
Ttoo-stop responses were more frequent on the /d-g/ than on theyb-,d/ continua, 
and there was a striking asymmetry favoring "gd" over d^ responses. 
Finally, the data at the longest closure duration suggest a dependence of CV 
perception on VC perception but not vice versa; the effect is assimilative for 
/b-d/ continua but contrastive for /d-g/ continua. 

In addition, it must be mentioned that individual variability was 
considerable. In each^ condition, there were some subjects whose 20-msec VC-CV 
results suggested CV dominance, oi*f&rs whose results suggested VC dominance, 
and still others whose results suggested neither. A nmber.of subjects did 
not give any two-stop responses at all, not even at the longest closure 
dufation, while others gave a farge nunber. There was absolutely no relation 
-^■bWtween the magnitude of the category boundary difference betweer^ isolated VC 
and CV syllables and' the proportion of two-stop responses given by individual 
subjects; in other m^a, whether or not a subject reported hearing two 
different stops in VC-CV stimuli did not depend on the degree of phonetic 
mismatch of the two sets of transitions--another disturbing result. OHe 
effect of closure duration on VC-CV identification was more consistent across 
subjects, but even here there were striking exceptions. For example, one 
subject (who listened only to the GC set and gave not a single two-stop 
response) showed a systematic decrease of "b" responses with closure duration 
in the /b-d/ condition and a systematic increase of "d" responses in the /d-g/ 
condition. Both patterns were highly atypical (cf. Figure 1). Needless to 
sav the mttern of VC-CV identification responses at the 100-msec closure and 
its relationship to the responses for isolated VC and* CV syllables also 
exhibited substantial variability. 

The most confusing part of the results derives from comparisons of VC-CV 
results with those for isolated VC and CV stimuli. Tartter et al . argued that 
this comparison is not meaningful after they had found -that transitionless 
vowels preceding CV or following VC stimuli significantly affected consonant 
perception. In other words, there may be performance changes between mono- 
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and disyllabic atlmull that have nothing to do vith VC or CV dominance, 
laverthaleas , ona aight have expected/ these changes to be in the same 
direction In different stinulu^ sets emd for different subJectSt which did not 
sees to be the case here. Instead of comparing response frequencies for mono- 
and dlayllables, Tartter et al. relied on a correlational analysis that was 
also performed on the present data. For each stimulus continuum, all 
intercorrelations of five average response percentages (VC, CV, and sihgle- 
stop VC-CV responses at three closure durations) were computed over subjects. 
These correlations are shown in Table 1 • 



Table 1 

Intercorrelations Between Average Single-Stop Response Percentages, 
/b-d/ /d-g/ 
CV . 20 60 100 CV 20 60 100 



GC 

VC, .53 . 35 . 35 . 55 . 06 -.11 -.08 -.05 

CV .38 ' .38 .41 .50 .50 -.05 

20 . 82«^ .55 .72»» -.18 

60 .81«<^ .34 



SYM 



VC .21 .26 . 58» .32 .45 " -24 .48 .25 

CV .es*** -n— .26 ■ -.18 .28 .53 

20 .16— .28 .73** -06 

60 .68** ^ -58* 

•p < .05 I ^ 

••p < .01 

f < .001 



The leftmost cell In each matrix represents the correlation between VC 
and CV identification. It tended to be positive but was not significant in 
any of the four conditions. The three bottom cells contain the intercorrela- 
tions between VC-CV results at different closure durations. The pattern is 
very clear here? Responses at 20 and 60 ms^c were positively correlated and 
so were, to a slightly lesser extend, the responses at 60 and 100 msec. 
Responses at 20 and 100 msec, however, were not significantly related to each 
other. This pattern is similar to that obtained by Tartter et al. who found a 
discontinuity between 25 and 50 msec of closure duration, which suggested to 
them a qualitative change in the process of cue integration* If such a change 
occurred in the present stimuli, it must have happened right around 60 msec of 
closure duration, for the 60-msec data correlated with both the 20-m8ec and 
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the lOO-Bsec data. Thus, lAile the present data are less compelling, they are 
not inconpatible with the findings of Tartter et al. 

Finally, consider the six correlations between monosyllable and dieyll- 
able identification. There was considerable variability here, and only one 
out of four conditions (STfM /b-d/) yielded any significant correlations at 
all. The correlations in that condition suggest CV dominance at 20 msec and, 
to soae extent, "at the 6p-msec closure duration as well. Tartter et al. found 
a very similar pattern in their symmetric /th-d/ stimuli. It is irrtereeting 
that precisely the condition most closely resembling 'the Tartter et 
als. experiment yielded comparable correlational N^esults. What is disturbing 
i;B that at least two of the other conditions gave entirely different, possibly 
random, patterns. Thus, even though the correlational findings of Tartter et 
al. have been replicated, their generality is called into question by the 
present data. * * * 

COHCLUSIOHS 

The present study served two purpoees. First, it provided a replication 
of Thrtter et al . (Note l). In the condition moet closely reeembling 
Experiment II of Tartter et al. (/ba/-/<ia/. SYM etirauli) , similar reeults were 
indeed obtained, and two-stop responses proved to be infrequent. Therefore, 
concerns about the quality of stimulus materials and about reetrictions on 
response choices in the Tartter et al . study can now be dismissed. Second, 
the present investigation extended the Tartter et al . paradigm £o asymmetric 
VC-CV stimuli and to another phonetic contrast (/da/-/ga/). The results 
obtained in these additional conditions show that the rseponse patterns in any 
particular conditio]q6|.have little generality. The relative perceptual weighte 
of the VC and CV transition cues and the effect of variatione in closure 
duration seem to depend strongly on the individual characterietice of the 
stimuli. 

Because of this lack of generality, only two very modeet conclueions are 
possible. One is that previous findings of strong CV transition dominance in 
the^ perception of VC-CV stimuli with conflicting transitions do not apply to 
the perception of stimuli with more nearly compatible transitions. 'Bie VC 
transitions seem to plav at l^ast as» important a role as the CV traneitions in 
these latter stimuli, which certainly are more repreeentative of natural, 
speech. Bio other conclusion is that thp perceptual integration of the VC and 
CV formant transition and closure duration cues into a single stop coneonaht 
peraept seems to be an exceedingly complex businees. This statement may be 
taken as (admittedly weak) support for the view (Bailey A Summerfield, 1980) 
that phonetic percepts are not computed by weighting and recombining separate- 
ly extracted cues, but that they are qualitiee dewAved from^xtended acouetic 
patterns by a heuristic based on articulatory plausibility— i.e . , ^ general 
speech knowledge. 
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ACOUSTIC LAHYIOBAL RBACTIOH TIMBi FORBPBHIOD AID STUTTBHIHO SEVERITY EFFECTS^ 
Ben C. Vatson^ and Peter J. Alfonao^ 

V 



^ Abatract . An earlier paper (Watson A Alfonso, 1982) presented a 
DOdel of the laryngeal reaction time (LHT) paradigm that included 
several factors that appeared to affect LRT values. The present 
* study aaseasea the effects of two of these factors: foreperiod and 
stuttering severity.! The former was assessed by the use of thirteen 
foreperiod durations. The latter was assessed by classifying 
experimental subjects as either mild or severe stutterers. Both 
factors significantly affected LHT values. More importantly, these 
factors demonstrated a composite effect on group LRT differences. 
Specifically^ mild stutterers* LRT values ^approached normal values 
as 'foreperiod increased, while severe stutterers' LRT values 
remained* significantly - greater thkn normal vfilues at all 
foreperiods. Results are discussed in terms of differential 
posturing and/or vibration initiation deficits underlying ^ 
stutterersV delayed LRT values. We caution that acoustic 
measurements alone are insufficient to specify fully the nature of 
the underlying deficits. s ^ 

A number of experiments (most notably Adams ^ Hayden, 1976; Cross 4 
Luper, 1979; Cross, Shadden, 4 . Luper, 1979; Starkweather, Hirachman, 1 
Tannenbaum, 1976) showed that atutterers as a g,roup are significantly slower 
than normals in initiating phonation in resHUwe to reaction signals. Using a 
simple reaction time paradigm that allowed subjects one to three seconds to 
prepare for a known response, we unexpectedly failed to replicate the results 
of the above experiments (Watson a Alfonso, 1982). That is, we failed to find 
a significant group dif/erence in laryngeal reaction time (LRT) between 
stutterers and nonstutterers, a difference we will refer to as the LRT effect. 
However, we did find significant within-group LRT differences betireen auditory 
and visual reaction signal conditions and between isolated vowel and phrase- 
initial vowel response conditions. The latter results suggested to us that 



•A portion of the data reported in thia paper was first presented at the 
annual convention of the American Bpeech-Language-Hearing Association, 
Aligeles, California, November 1981. A similar version of this paper will 
appear in the Journal of Fluency Disorders . 

♦Also Departmtat of Communication Sciences, University of Connecticut, Storrs, 
CT 0626^/^ ^ 
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pur IJtt seaaurementa were indeed sufficiently sensitive to^ detect an LRT 
effect if one existed* Other recent investigations have also failed to 
demonstrate a significtmt I^T effect in both phild and q^dult stutterers 
(of. CulXinah A Springer, 1980; Murphy A Baumgartner, 1 981 ; Venkatagiri , 1981, 
1962). The study reported here is motivated by our original experiment, as 
v^l aa other recent experiments that failed to demonstrate a significant 
effect. 

¥e are interested in isolating those factors that form the basis^ for 
significatii' ^LRT differences between stutterers and their controls. To this 
end, we have conducted experiments based on the model of the LRT paradigm 
developed in our original experim'ent. The model includes factors related to 
the perception of the reaction signal, production of the response, ^nd factors 
specifically related to characteristics of stuttering su^)oect3 that iirfluence 
LRT values. For example, We included in the model "reaction signal modality," 
a perceptual component, and "response type," a production component, based on 
our findings of significant LRT differences for both nonstutterers and 
stutterers as a function of reactipn signal modality (visual vs. auditory) and; 
respobse condition (isolated vs. phrase-ini'tial vowel) . 

There were 1;wo purposes to the* study reported here. The first purpose 
was to inyestigate further the effects of tWo other factors on stutterers' 'LRT 
values as well as on the LRT effect. These factors are included in the model 
as foreperiod and stuttering severity. argued that bur failure to find a 

significant LRT. effect in our original ' experiment was related to our use of 
relatively long foreperiods and to the mild-to-cioderate severity rating of our 
experimental group. ^ 

The foreperigd factor is included in the "Perceptual Component" of the 
model although production events may also occur during this interval. In our 
experiments, foreperiod is defined as the interval between the presentation of 
the warning cue and presentation of the phonate cue. Sufficiently long 
foreperiods provide ^he subject with time to prepare for a known response 
(Niemi A Naatanen, 1981). Preparatory activity that may occur during the 
foreperiod includes perception of the warning cue, formulation and transmis- 
sion of appropriate mooter comm^ds to i)Osture the speech mechanism for the 
re(iuired response,, and movements of the various components of the speech 
^echanism to achieve the required pre-phonatory posture. The extent* of 
'preparatory^ activity that actually occurs is a function of foreperiod dura- 
tion. TtiUBf short foreperiods may restrict preparatory activity to perception 
of the warning cue and perhaps to formulation and transmission of motor 
commands, while long foreperiods may^ permit formulation and transmission of 
motor commands ^nd posturing of the speech mechanism before presentation of 
the phonate cu^. « ^ 

The notion of a foreperiod effect 'on nonstutterers' LRT values is 
supported by^ Izdebski's (1980) observation of a U-shaped function when LRT 
values are plotted across, a range of increasing foreperiods. I5iat is, he 
found that LRT values decrease to a minimum *as foreperiod increases to about 
15DO msec and then ^increase as . foreperiod increases beyond 1500 msec. These 
results suggest that LRT values occurring at foreperiods less than 1500 msec 
reflect the subject's inability to complete preparatory activity. Increasing 
LRT values beyond 1500 tHaec may reflect the subject's inattention to the task 
or failure to maintain the pre-phonatory posture. We have argued previoueiy 
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that atutterera* LET values may be particularly dependent upon foreperiod 
duration. Specifically, we hypothesised that yhen certain stutterers are 
given sufficient time to posture the speech mechanism, they will demonstrate 
1R% values similar to those of nomals. We concluded that the long foreperi- 
ods used in our original experiment (one to three seconds) provided stutterers 
with ample time to achieve the appropriate posture before the initiation of 
phonation and contributed to our finding of a nonsignificant IflT effect. 

The studies referred to above that reported a significant LRT effect (and 

used isolated vowels as the response, a task similar to one of the response 

conditions in our original experiment) did not incorporate warning cues in 

their experimental designs (cf. Adams A Hayden, 1976; Cross & Luper, 1979; 

Cross et al., 1979). Consequently, it cannot be determined if the stutterers 

in these experiBjrents achieved the appropriate 3?esponse posture before the 

presentation of the phonate cue. Thus, experiments that report significant 

IflT effects but do not include a warning cue may reflect stutterers' 

difficulty with posturing the speech mechaaiam before phonation onset as well 

as difficulties associatecL with initiating the response. It seems possible 

that certain stutterers* delayed IflT values may be related to posturing, that 

is> pre*-phohatory events (as suggested by Freeman A Ushijima, 1978), while 

other stutterers* delayed IflT values may be more directly related to initia-- 

tion of the response, or perhaps a combination of posturing and initiation. 

activities. If this is the case, one may suspect that certain stutterers' ,LRT 

values will approach normal values as foreperiod increases. However, other 

stutterers' LRT values could remain significantly greater than normal values 

throughout the entire range of foreperiods. The first hypothesis xrnder test 

in this study 'States the^t th^re is a foreperiod' effect on stutterers' LRT 

values. To test this notion, we extended the range of the foreperiods from 

100 msec to 3000 msec; Specifically, those stutterers with deficits only in 

posturing the speech mechanism will demonstrate IflT values approaching normal 

values as foreperiod increases, while those stutterers with deficits in 

initiating the response, or in both posturing and initiation, will demonstrate 

IflT values significantly greater than normal values throughout the range of 

short to long foreperiods. 
♦ 

The second factor that may affect stutterers' LRT values is stuttering 
severity, included in the model under "Subject Characteristics." The resxxlts 
of several studies (Hayden, 1975; Lewis, Ingham, A Gervens, Note 1; Watson A 
Alfonso, 1982) suggest that mild stutterers may exhibit LRT values more 
similar to normals than would severe stutterers. Additional support for this 
notion is found in a comparison of results obtained in our original experiment 
and in a study by Reich, Till, and Goldsmith (l98l). The average severity 
rating of our experimental group was mild- to- mod era te . However, Reich et 
al . (1981) , using stuttering subjects classified as mod erate-tQr severe, 
obtained a significant IflT effect. The experimental procedures were very 
similar between the two studies. Both 'included foreperiods of similar 
duration, for exmple, yet the results are clearly different. We suggest that 
differences between the rpsults of these studies may, in part, be attributable 
to differences in the sluittering .severity .ratings of the experimental groups. 
Finally, support for a stuttering severity effect on timing is found in data 
reported by Borden (1982). Specifically, she observed that severe stutterers 
displayed significantly longer vocal and manilal "execution" time values than 
nonstutterers, while none of the differences between mild stutterers and 
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Figure 1. Results of the stuttering severity analysis. 
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nonet utter ere reached significance. Thus, the second hypothesis under test is 
that there is a stuttering severity effect on stutterers' LRT values. That 
is, we expect that a group of sevete stutterers will demonstrate greater LRT 
values than will a group of mild stutterers. 

The two hypotheses described above assess the independent effects of 
foreperiod and stuttering severity on stuttenars' LRT values ^en compared to 
nonstutterers. However, it would he interesting to determine the relationship 
between foreperiod and stuttering severity. Consequently, the second purpose 
of this study was to assess the combined effect of foreperiod and stuttering 
severity on stutterers' LRT values. For example, we havh hypothesized that 
certain stutterers' LRT values could, approach normal values as foreperiod 
increases, in that these stutterers' delayed IRT values may be primarily 
related to difficulty in posturing the speech mechanism^ Alternatively, we 
hypothesiced that IBT values of other stutterers could remain significantly 
different from normals throughout the entire range of foreperiods, implying 
that these stutterers' delayed LRT values may be related to ^difficulty 
initiating the response or, perhaps, a combination of posturing and initiation 
difficulties. We would like to ascertain if groups of stutterers, classified 
by severity, can be characterized according to Vie "posture" versus the 
"initiation'* hypothesis. That ' is, is it the case ^ that mild &tunerers| 
. primary difficulty is posturing the speech mechanism idiile severe stutterers' 
'^"^fejiiliary difficulty is some combination of posturing and response . initiation. 
The ^ third hypothesis tegt^ this notion. . Specifically, we expdct that mild 
stutterers' LRT values, will approach normal values, while severe stutterers' 
LRT v^ues will remain significantly greater than normal values, as foreperiod 
increases. • 

In summary, the ( first purpose of this study i*s to determine the effects 
of two factors included in the model (Watson a Alfonso, 1982) of the LRT 
paradigm on the IBT effect and on stutterers' LRT values. The second purpose 
is to test the notion that qualitatively different deficits, posturing versus 
initiation, underlie mild and severe -stutterers' delayed IRT values. 



METHOD , 

Subjects 

Subjects participating in this study included ten adult stutterers^ and 
five adult nonstutterers. In order to test the effect of stuttering. severity 
on stutterers' LRT values, ^it was necessary to classify the experimental 
subjects on this dimension. Stutterers were classified on the basis of three 
separate analyses of severity. First, a certified Speech-Language Pathologist 
subjectively rated severity of the stuttering subjects during conversational 
speech and speech i^il^ reading the Rainbow Passage. ^ A second certified 
Speech-Language Pathologist objectively rated the same speech samples using 
the Stuttering Severity Index (SSI) (Riley, 1972) and the Stuttering Interview 
(SI) (Ryan, 1974X. > ^ 

The results of the. stuttering severity' analysis (shown in Figure 1) 
indicate that the experimental subjects could be classified into two distinct 
groups: f^e severe stutterers and five mild stutterere. Since reaction time 
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Figure 2. One sequence of stimuli used to assess the effect of foreperiod on 
IHT. The reaction signal varied from 100 to 3000 msec. Reaction 
signal onset served as the_warning cue, offset as the phonate cue. 
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values nay be affeoted by subject sex and age (BLrren A Botvinick, 1955; 
lEdebski, 1980; Weiss, 1965)t we matched the control group against the average 
age and sex ratio of the two stuttering groups. 

Test Stinuli . , • 

Figure 2 illustrates one sequence of the stimuli used to assess the 
effect of foreperiod on IflT values. Each sequence was separated by a variable 
interstimulus interval (ISI) of eight to twelve seconds. ISIs of this 
duration require that subjects breathe normally between response sequences. 
Consequently, subjects are not able to remain in a phonatory position between 
responses. The reaction signaT consisted of the synthetic vowel /a/. Onset 
of the reaction signal served as the warning cue and the offset served as the 
phonate cue. Subjects were instructed to "get ready" to phonate iriien and only 
when they heard the warning cue. Duraticn of the reaction signal varied from 
100 msec to 300 msec in 100 msec increments, 700 msec to 1500 msec in 200 msec 
fncrsMents, and from 2000 msec to 3000 msec in 500 msec increments, a total of 
13 foreperiods. A "terminate phonSTtion" signal was presented two seconds 
after the phonate cue. Ihe terminate signal consisted of the synthetic vbwel 
/i/. Bach of the 13 sequences was replicated five timers, randomized, and 
output onto iaudiotape using the Haskins. Laboratories Pulse Code Modulation 
(PCM) system. 

Procedures * 

Stimulus $^quences were presented simultaneously to the subject, seated 
in a soundproof booth, and to track one of a two- track tape recorder. 
Subjects* responses were recorded on track two of the tape recorder. Subjects 
were instructed to phonate the vowel /a/ immediately at the offset-of the 
reaction signal and to continue phohation until presentation of the terminate 
gdgnal* All subjects wet-e allowed 21 train/ing sequences, including long and 
short foreperiods. Although most subjects required few^r than the maximum 
dumber of training sequences to learn the relatively simple task, all subjects 
were exposed ko training sequences containing long apd short foreperiods. 
Response sequences were presented in two seyen-minute tests separated by an 
optional three- to five-minute rest interval. 

Fluency Criteria 

We followed the same procedures used in our original experiment to insure 
that only fluent fesponsqp were analyzed. First, subjects were instructed to 
identify any production that they thought was dysfluent. Second, the experi- 
menter noted any production that he thought was dysfluent. No responses were 
omitted on the basis of the fi^rst two criteria. Finally, productions were 
excluded from the data set if the waveform showed certain irregularities that 
may be, related to non-audible stuttering, such as isolated pitch pulses before 
'the onset of contijiuous phonation. As a result of the third criterion, three 
responses were- excluded from the mild stutterers* data set, one respoijbe jias 
excluied from the severe stutterers* data set, and no responses were excluded 
from the nonstutterers* data set. Thus, 32? LRT values were meastired* for mild 
stutterers, 324 values were measured for severe stutterers, and 325 values 
were measured for nonstutterers. 



2 



267 



Aeottitlo LBryngMl Btactlon Tim%i 
for«p«xlod and Stutterlag Savarltj Sf facta 



iiita vara analysed vith the aid of a computer vavefom editing system at 
Kaaklna laboratoriaa. Taaporal reaolution of the vavefom analyser is accu- 
rata to ona- tenth of a silliaecond (Hye, Beiaa, Cooper, NcGuire, Mermelstein, 
A NontllckT 1973)* I£T valuea vera defined aa the interval betveen the offset 
of the phonata cue and the onaet of the first regular pitch pulise of the 
voiced vovel /a/ . 

Statiatical Analyees 

All data vara subjected .to several multiple correlation regression (MCR) 
analyaes (Cohen A Cohen, 1973) for the folloving reasons. First, the 
procedure permits analysis of interaction effects betveen ijjterval (foreperi- 
od) and nominal (stuttering severity) level independent variables, a capabili-^ 
ty not provided by traditional multiple analysis of variance procedures. 
Second, HCR analysia permits experimenter selection of specific group compari- 
aona. Finally, MCR analysis allova for the evaluation of nonlinear relation- 
ships, such aa the hypotheaiaed relationship betveen foreperiod and LRT. fhe 
statistical design used in^this experiment vas a subjects vithin groups 
(nojital, mild, severe) by qondltion (foreperiod) repeated measures MGR. This^ 
design requires separate NCR analyses to determine (l ) the significance oY the" 
betveen-aubject (stuttering severity) main effedt and (2) the vi thin-subject 
(foreperiod) main effect and interaction (stuttering severity x foreperiod) 
effect. The first MCR analysis vas conducted to det^tnine the significance of 
the stuttering severity factor. For this analysis, the subject group variable 
Haa coded to permit separate comparisons betveen nonstutterers and . mild 
stutterers and betveen mild and severe stutterers. The second MCR analysis 
vas conducted to determine the significance of the fprepdriod factbr and the 
Interaction betveen stuttering severity and foreperiod. For this analysis, 
the subject group variable vas, once again, coded to permit comparisons 
betveen normala and mild stuttei'ers as veil as betveen mild and severe 
stutterers, A third MCR analysis vas conducted to determine the magnitude of 
the nonlinear relationahip betveen foreperiod and LRT for each group in order 
to determine vhether there vas an optimal foreperiod effect. Finally, 
comparisons betveen group mean UlT values at each foreperiod vere conducted 
using the nonparametric Randomisation Test for Independent Samples, since 
several of the criteria required by parametric analyses vere not fulfilled by 
these data (Siegel, 1936). 



RESULTS 

Figure 3 displays a summary of LRT values for the complete data 
set.1 Each data point in this figure represents the average of all analyzed 
responses per subject pooled across the five subjects in each group. lilT 
values are exmesaed in group means and tvo standard deviation dispersions for 
the three sublect groups ajad 13 foreperiod conditions. Also shovn ard group 
means and sl;flfldard deviations collapsed across the 13 foreperiod conditions. 
IRT vialues^r\pnstutterers are shovn as closed circles, for mild stutterers 
as open clVfJ-ca^and for severe stutterers as open triangles. Note that this 
figure demo^tetrattes that I^T varies as a function of subje^ group and 
foreperiod. 
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figure 3. Acoustic IflT values in group neans iand standard deviation disper- 
* Bions for the 15 foreperiod conditions. Each data point represents 
the individual subject averages pooled across the five subjects in 
(sacb group. ' 
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The first t¥o hypotheses in this study predicted foreperiod and stutter- 
ing severity effects on LET. The results of MCR analyses of these main 
effects as well as the stuttering severity by foreperiod interaction effect 
are suuarised in Table 1. niis table shows that both the stuttering severity 
and foreperiod factors are significant (j^ < .01). 

Partial regression coefficients obtained from the between-subjecte MCR 
are presented in Table 2. Coefficients for both the nonstutterer versus mild 
stutterer and mild versus severe stutterer group comparisons were significant 
( < .01). These results indicate that the three groups' LRT values were 
significantly different irtien collapsed across the 13 foreperiod conditions* 

T^ble 3 shows results of analyses of the power of the polynomial 
describing the relationship between foreperiod and LRT for each subject group. 
Second-order polynomials were found for the nonstiUtterers and mild stutterers. 
That is, IfiT values for these subjects decrease to a minimum and then increase 
as foreperiod increases. A nonlinear relationship between foreperiod and LRT 
was also reported by Isdebski (19B0) following analysis of a reduced data set. 
He found, using only^ormal subjects, a second-order relationship between 
foreperiod and LRT. However, our data indicate that the relationship between 
LRT and foreperiod for^severe stutterers is different. For these subjects, 
Table 3 shows that a third-order polynomial also becomes significant and 
approaches the second order term in best describing the shape of the 'curve. 
This implies that IflT values for severe stutterers tend to decrease to a 
minimum, then increase to. a maxim^im, and then decrease again as foreperiod 
increases. These results emphasize the difference between severe stutterers 
versus mild stutterers and nonstutterers. For example, the data shown in 
Figure 3 for mild stutterers and nonstutterers show single maximum and minimum 
' values, yielding a single inflection point in the curve. A curve of predicted 
LRT values, representing least-squared deviations, was obtained by solving 
regression equationjB for each group. Analysis of predicted curves indicates 
that the inflection points for nonstutterers and mild stutterers occur at 2000 
and 1500 msec, respectively. For severe stutterers, there is less difference 
between maxikum and minimum LRT values and the curve has two inflection 
points, 900 and 2500 mseci. Note also that the fastest LRT for nonstutterers 
occurred at a foreperiod of 2000 msec, consistent with the results reported by 
iKdebski (l980). For the severe stutterers, fastest LRT values occurred at a 
foreperiod of 500 msec. The foreperiod at which the fastest LRT value 
occurred for mild stutterers is less clear, but seems to be around 1300 msec* 
Thus, minimum LRT values also seem to vary as a function of group .membership. 
Finally, it appears that foreperiod has a greater effect on the maximum and 
minimum LRT values of nonstutterers and mild stutterers than it does for the 
severe stutterers* LRT values . To summarize , the results reported thus far 
support' the first two hypotheses of this study. That is, both the stuttering 
severity factor and foreperiod factor were shown to affect LRT values 
significantly. In addition, partial regression coefficients revealed that the 
stuttering severity main effect reflects significant group differences between 
nonstutterers and mild stutterers as well as between mild and severe stutter- 
ers when LRT values are collapsed across the 13 fore periods. Finally, 
foreperigd has a greater effect on nonstutterers* and mild stutterers* LRT 
values than on severe a^tutterers* LRT values. 
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Table 1 

■I 

Suuary of Naln and Interaction Effects 
Main Bffecte F il 

Stuttering eeverity 8.88** 2,12 

Popeperiod 2.8«* 12, 144 

Interaction Effect 

Stuttering severity 
by foreperiod •415 24, 144 



••P. 99 (2,12) - 6.95 
•♦P. 99 (12,144) - 2.31 
•P. 95 (24,144) - 1.59 



Table 2 

Partial Regression Coefficicmts for Stuttering 
Severity PaCTOr 

Comparison ^ — ~ 

Honstutterers vs. 

mUd stutterers ' -57.36 14.19** 1t13 



Hild stutterers vs. 

severe stutterers -53.59 12. 39** > 1,13 

- ••P.99 (M3) ■ 9.07 



O . * 271 

ERIC 28 1 



I 

Aooufttio ImtjngmmL iteaotion TUa: 
FoNp^rlod msA Stut taring Savarltj Bffaots 



Table 3 



Stuuiary of Power Polynoalal Analysla of Foreperlod 





Power tern 


I no 8qr« 


¥ 


df 


Bonatutterera 


— 7~ 

linear/ 
quad r/tlc 
cubic 


.27 
.30 
.06 


4.17 
08. 10* 
1 .45 


1,11 
1 , 10 

1 ,9 
1,11 

1 ,10 
1,9 


Mild atutterers 


linear 
quad ratlc 
cubic 


.17 
.33 
.16 


2.27 
11.61»» 

4.64 


Severe atutterera 


linear 
quad ratlc 
cQblc 


^.11 
.36 
.26 


1.39 
1 3. 33— 
8.66* 


1,11 
1,10 
1,9 



•F.95 (1,11) - 4.84 

♦P. 95 (1,10) - 4.^ 

••F.99 (1 ,10) - 10.04 

•P. 95 (1,9) - 5.12 
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The third hypothesis stated that there was a difference between non- 
atutterera' and stutterers' (grouped by severity) LET values ae a function of 
foreperlod. Our original experiment revealed nonsignificant differences 
between nonstutterers and iilld'- moderate stutterers at 1, 2, ahd 3 second 
foreperiods. Hence» in the present study, we expected ta find significant 
differences between nonstutterers' and mild stutterers' ^T values only at 
foreperiods less than 1100 msec. Conversely, we expected to find aignificant 
^ differenced between nonstutterers' and severe stutterers' hM values at both 
ahort and long foreperiods. These hypotheses were ^tested by conducting post- 
hoc group cooparlsons by using the Randomisation Test for Independent Samples. 
Resulta of these comparisons are shown below the abscissa in Figure 3. The 
symbol If refers to nonstutterers, and the symbols H and S refer to mild and 
severe etuttererSi respectively. A solid line connecting groups indicates no 
significant difference between group meana. Heaults of this analysis reveal 
that aavere stutterers' LHT values are significantly greater than nonstutter- 
ers' at" all of 13 foreperiods <.-05). On the other hand, mild stutterers' 
LHT values are significantly greater tha? nonstutterers' at only 5 of the 
first 7 foreperiods, that is, at forepdrioda less than 1100 msec. However, we 
unexpectedly found significant U{T differencea between nonstutterers and mild 
atutterers at 4 of 6 foreperiods equal to and greater than 1100 Msec. Thus, 
results of group compariaona aa a function of foreperlod clearly aupport o^ 
hypotheaised differences between nonstutterers' and severe stutterers' LRT 
values, but only partially support our hypothesised differences between 
nonstutterers' and mild stutterers' LRT values. In general, these results 
demonstrate that mild atutterers' LRT values approach those of nonstutterers 
as foreperlod increaaes, while aevere stutterers' LRTs remain significantly 
greater than nonstutterers' throughout the entire range of foreperiods. 
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DI3CU3SI0M 

The first interesting finding of the present study is that of a 
significant stuttering severity factor. This finding is consistent with 
reaction tiae data for co«plex vocal and manual responses reported by Borden 
(l982). ' Using the same stuttering subjects used in the present study and* a 
constant foreperiod equal to one second, she observed significant group 
differences between nonstutterers and severe stutterers for the execution of 
perceptually fluent counting and finger tapping responses. Differencee 
between nonstutterers and mild stutterers for the same tasks failed to reach 
significal<3e. Thus, results of this and the Borden etudy indicate that 
stuttering severity affects group timing differencee for both simple and 
complex vocal responses as well as for manual responses. Furthermore, theee 
resul^ts suggest that the delayed IBT values demonstrated by the experimental 
subjects may represent an underlying deficit in general motor control in 
stutterers as a group. Finally, .these results suggeet that the magnitude of 
the delay, and correspondingly thir magnitude of the deficit, ie reflected in 
the stuttering severity rating. Of course, acoustic measuremente alone do not 
permit analysis of the motor control processes occurring before the onset of 
th6 acoustic response. Later in„ thie discussion, we will suggest proceduree 
that may allow analysis of motor control proceesea during posturing arid 
response onset . . ♦ 

I^rhape the meet interesting and important, finding of this etu^y is the 
composite effect of the stuttering severity and foreperiod factors on the 
significance of group lilT differences between stutterers and nonstutterere. 
Specifically, we observed that mild etutterere* LRT values approach normal 
values as foreperiod increases, while eevere etutterers' LRT values are 
significantly greater than normal values throughout the range of foreperiode. 
Ttieee results are in general * agreement with the findings of our original MT 
experin^enti That study failed to show significant group LRT differencee 
between nonstutterers and a'group of mild to moderate stutterers for foreperi- 
ode equal to one, two, and three second ft.' Although the present study reports 
nonsignificant differences at only 2 of 6 foreperiods in this range, it should 
be pointed out that the Results of the present etudy reflect fewer eubjects 
per group, fewer responses per subject, . and the use of non- parametric 
statistics. With these differences aside, the present study supports our 
original experiment in that the differences between mild stutterere* and 
nonstutterers' LRT values are significantly leee than the differences between 
nonstutterers* and severe stutterers' LRT values. 

Throughout - this paper, we have noted that l^feg^ foreperiods permit 
subjects to complete activity required to posture the epeech mechanism for the 
voiced response. Consequently, the finding that mild stutterere* LRT values 
approach normal values as fftreperiod increases, whereae severe etutterere LRT 
values do not, suggests that different deficits may contribute to delayed lilT 
values for the two groupe of etutterers. Specifically, with regard to the 
comparisons between nonstutterers and mild etutterers, our results generally 
support the hypothesis that mild stutterere' primary difficulty ie posturing 
the speech mechanism. However, it ie also likely that our mild stutterere 
have some difficulty initiating vibration, since their MT valuee do not 
become identical with those of the nonstutterers. Results of the comparisons 
between nonstutterers and severe etutterers ae a function of foreperiod 
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Figure 4* Posture and initiation components of the laryngeal reaction time 
responee. ^ 



ilfibuatic lAryngeal Reaction Time: 
Po reaper led and Stuttering Severity^ Effeota 

suggest that these stutterers may have both postut*ing and vibration initiation 
deficits. ' 

Reaction time response^ have ^een studied vith respect to their premotor 
and motor c cm ponents (Botvinick & Thompson, 1966).; Follbving this example, we 
have choaen to study the posture and initiation; [components of the reaction 
time response in em attempt to understand better ||the qualitative differences 
in the deficits underlying stutt^erers' delayed IJR|f values. Th^se components 
are schematically represented in Figure 4«< The ||x)sture component is repre- 
sented^ by a series of processes related to percepition of the warning and/ or 
phonate cue, formulation and transmission of ne\;^omotor commands to posture 
the speech mechanism, posturing of the speech fli^chaniam fbr the required 
response, and foimulation of neuromotor commandsl^to initiate the resx)onse. 
The formulation of neuromotor commands for initiatitp^n may occur simultaneously 
with the formulation and transmission of neuromotor* commands %for posturing. 
Postural processes are also taken to include pr*i^|)honatory gestures. The 
initiation 9^ponent is represented by processes related to the transmission 
and execution .of neuromotor commands for the response. The consequefQces of 
executing these commands are: (l ) muscular adjustments, (2) articulator 
movement, and finally, (3) acoustic output. Figure 4 demonstrates the special 
case in lAiich foreperiod duration permits completion of all postural activity 
prior to the presentation of the phonate cue. 

The interval require^ for perceptual processing of the warning and 
phonate cue will vary as a function of stimulus modality and intensity 
^Elliot, 1968; Murray, 1970; Watson 4 Alfonso, 1982). There is conflicting 
evidence regarding the effect of stimulus modality on the LRT effect. For 

^example, significant group reaction time differences between stutterers and 
nonstutterers have been reported for auditory but not for visual stimuli by 
McFarlane and Prins (1978) and McFarlane and Shipley (1981). Conversely, 
Watson and Alfonso (1982) failed • to find significant between-group LRT 
differences for auditory or visual stimuli. Thus, it is not conclusive 
idiether stimulus modality influences the LRT effect. However , ^Kohfeld ( 1 971 ) 
has shown that stimulus modality and, intensity parameters interact in a 
compler^^manner and, more importantly, that cross modality reaction tiioe 
differences -may reflect the failure of experimenters to insure that visual and 
auditory stimuli are pi'esented at peychophysically equal intensity levels. In 
addition, cognitive and affective factors, such as instructions to the subject 
and the experimental settling (Murray, 1970), as well as a variable foreperiod 
(Niemi A Lehtonen, 1982) may interact with stimulus paramelJers to alter the 
duration of' perceptual processes. Thus, the duration of the perceptual 
processing interval is determined by several variables. The effec'ts of 
stimulus- related variables may be reduced by maintaining constant stimulus 
modality and intensity parameters for all subjects. Though it is not possible 
Ix) measiire the duration of perceptual processes in humans directly. Wall, 
Remond, and Dobson (1953) provide an estimate of this interval bas^d on 
•physiological data obtained from anesthetized animals. Recording electrical 
activity in pyramidal tract neurons in the motor cortex, they obferved a 
latency of approximately 50 msec between the onset of a visual stimulias and 

' the onset of neural activity. These data suggest ,that the contribution of 
perceptual procferdsing activity to overall HIT values may be relatively small. 
To summarfae, it is not possible to measure the duration of ]^erceptual 
processes directly. However, by controlling stimulus intensity and modality 
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parameters, the duration of ihis Interval may be held relatively constant 
acrosd subjects* 

The Interval required for the completion of neuromotor processes (l»e., 
fomulatlon and transmission of appropriate neural commands to the peripheral 
musculature) may also contribute minimally to overall IRT values In the simple 
reaction time parAdlgm. Estimates of fomulatlon Latencies are not available 
for human subjects. However, the transmission velocity of neural impulses 
along the recurrent Isurynge^l nerve Is approximately 56 meters/ second In 
nonstutterers (Pllsberg 4 Llndholm, 1970). This value, In addition to a 
residual .latency, of 1.5 to 2.5 msec due to synaptic jionctlons and the 
decreasing diameter of peripheral nerve fibers (Basmajlan, 1970), yields an 
estimated maximum transmission latency in nonstutterers of approximately 3*0 
msec* Thus it appears that although the duration of perceptual and neuromotor 
processing components in the I£T paradigm cannot be directly measured, it is' 
likely that the conlTrlbutlon of - both of these prooessee to group lilT 
differeliQes is relatively Insignificant. 

Posturing the speech mechanism for the onset of an isolated, voiced vowel 
requires muscular adjustments in the respiratory, laryngeal, and articulatory 
systems. In the respiratory System, ^hese adjustments result in the optimiza- 
tion of thoracic muscle tension. Optimal muscle tension levels, in turn, 
facilitate rapid generation of sufficient swbglQttal pressure for phonrftion 
initiation (Baken, Cavallo, 4 Weissman, 1979). In the laryngeal system, 
muscular adjustments modify vocal fold tension and position to facilitate 
phonation. Articulatory adjustments result in achievement of supralarjrngeal 
vocal tract postures appropriate tor the required response (e.g., the isolated 
vowel /a/). We assume that posturing activity within these systems will occur 
simultaneously. Furthermore, it is likely that the najbure of the posturing 
activity within any system is, in part, a funcvtion of the qualitative 
Interaction between systems. For ^example, there may be differences in 
respiratory and laryngeal coupling for thd onset of voiced versus voiceless 
vowels. In addition, articulttory postures may affect * laryngeal posturing 
(i.e., constricted versus ojpen vocal tract^configuratibns) • 

In the aerodynamic domain, respiratory posturing also occurs with respect 
to lung volume. For example, Izdebski and Shi pp (1978) have shown that a lung 
volume of a'pproxlmatel^ 50^ vital capacity yields faster IBT values than do 
pre-phonatory 'lung volumes of 255^ and 75^ vital capacity. In addition, 
Hbshlko (1965) fouad that nonstutterers usually initiate phonation^from about 
50>t vital capacity. Thus, this value appears to represent an ^optimal lung 
volume for the initiation of vocal fold vibration. * • 

K ' 

It is also true that IRT values are affected by processes included in the 
initiation component. These include transmission ^nd execution of Initiation 
neuromotor commands, muscle contraction, coordinated movement of speech struc- 
tures, and finally, generation .of the resultant acoxistlc output . Reaction 
time measurements of the latter three processes can b^ obtained and are 
Illustrated in Figure 

Lastly, we should emphasize that posturing deficits in stutterers wout& 
delay initiation of the response. For example, the latency of vibration onset 
for stutterers may be prolonged if the vocal folds are "hyper- postured that 
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i8» postured vitb excessive tension and adduction, or abnormally postured 
(i.e*, simultaneous adduction and abduction, cf. Freeman A Ushijima, 1978). 
Hyper- postured vocal folds would likely result in abnormally high levels of 
glottal resistance and, therefore » the need for higher levels of subglottal 
pressure, vbile abnormally postured vocal folds would prevent the accumulation 
of sufficient subglottal pressure to itiltiate vibration. Finally, markedly 
constricted ^rticulatory postures increase supraglottal pressures and, thus, 
may prolong vibration onset latencies. Thef point we wish to make is that the 
.delayed reaction time values^ in these ins'^ances would reflect postural rather 
than initiation deficits. 

Ve assume that the contribution of perceptual processes i*n this study to 
between-group differences wad insignificant^ since stimulus modality and inten- 
sity parameters were held relatively constant for all subjects. I4 addition, 
^ it is likeiy that the contribution of neuromotor process to the LRT effefct was 
insignificant. The finding that mild* stutterers' LRT values approach those of 
nonstutterers as foreperlod . Increases, suggests that the primary difficulty 
for this group* of stutterers is related tcf posturing the speech mechanism. 
However, pince ^I£T Values for mild stutterers did not become identical with < 
those of nonstutterers, it is also possible that these stutterers have some 
degree of difficulty initiating vibration as well. The effect of foreperlod 
on severe stutterers' LRT values is diffei^nt. The finding that severe 
stutterers' LRT values fail to appfoach those of nonstutterers as foreperlod , 
increases, suggests that severe stutterers may have difficulty in both 
posturing the speech mechanism and initiating vocal fold vibration. What is 
important, *is that the underlying deficit may be^ qualitatively different 
between mild and severe stutterers. Unf ort\inately, LRT measures obtained from 
acoustic analysis alone do not permit prep>lse specification of the loci of 
deficits in phonation onset activity in these stutterers, For example, it is 
possible that mild stutterers have the same -type of deficits as do severe 
stutterers but to a lesser degree. Thus, we feel that we have* made the most 
of acoustic measures of LRT. That is, we need to investigate those activities 
that occur before the onset of voicing. 

\ - - 

The advantage of obtaining simultaneous measures in the acoustic, move- 
ment, and ENG domains is discussed by Baer and Alfonso (in press). They 
suggest that simultaneous measures may^ be particularly informative in LRT 
ezperimeni|s because they provide information regarding activity prior to onset 
of the acoustic signal corresponding to vocal fold vibration. For ejcample, 
the combined duration of perceptual and neuromotor processes may be inferred 
from EHG signals recorded from intrinsic laryngeal muscles. That is, the 
latency between the offset of the warning signal and the onset of the EMG 
signal in the laryngeal muscles may yield an estimate of the Jtime required to 
complete perceptual and neuromotor processes. In addition, EMG measures may 
be useful in documenting the latency of onset, synergy, and amoiint of muscular 
activity during pre-phonatory posturing of the speo^ch system as well as during 
generaticm of subglottal pressure by the respiratory system.% I^rect observa- 
tion of chesV wall and vocal fold movements, via Respitrace (Cohn et al.. Note 
2) and 'transillumination instrumentation, respectively, may also provide 
information regarding the amount and coordination of respiratory and laryngeal 
posturing activity as well as the interaction between laryngeal posturilig and 
respiratory system activity during the generation of subglottal pressure. 
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' In conclusion, the results of the present' study support the results of 

our original experlmimt by demonstrating a sigfaificaht -stuttering severity 
effect. i\jfrtheniore, the present results support the notion that mild, and 
severe stutterers* prolonged IHT values may refleqt differential deficits in 
posturing and/ or vibration initiation/ We recognise, however, .that acoustic 
analyses alone will not specl^fically reveal the nature "^of deficits 
contributingy to 'stutterers* delayed IHT values. We plan future LET 
experiments incorporating simultaneous measures in the acoustic, movanent, and 
BIG domains. Only through the use of simultaneous measures can the nature of. 
deficits underlying stutterers' often reported difficiilty in initiating, 
phonation be systematically described. 
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. . FOOTNOTE 

. . * . ,. ' ^ . '\ 

iThe' present study .reports results obtained from statistical analysis of 
<lhe comple.te ^ data set. In so doing, it is consistent with most lilT studies 
comparing nonstutterers with stutterers. However, two procedures are some- 
times us&d to eliminate the maximum and minimum lAT values prior to group 
comparisons^. The rationale for either of these procedures i^that IfiT values 
significantly faster than, the mean -refldst anticipatory responses occurring 
before 'the phonf^t^ cue, while values significantly slower than the mean 
reflect the subjects* inattention to the task. As an example of one 
prpcedure, Izdebski and Shipp (1978) and ledebski (1980) used statistical 
tests to eliminate only signiCLcant outliers. As an example of the second 
procedure, Reich et al. (1981) emitted the fastest and slowest responses of 
each subject before group comparisons. In a forthcoming paper, we will 
discuss the effects of various data reduction procedures on the lilT effect. 
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DISIHHIBITIOH OF MASKING IH AUDITORY SENSORt MEMORY* , 
Bobert G. Crowder^ 



/ 

Abstract .* A series of experiments was performed on the difference 
betveen single- and double-masking agents In auditory memory: Single 
or double suffixes 'were presented following Immediate memory lists, 
with parametric variation the delay of the suffixes relative to tHe 
end of the list. The main Interest was In the shape of th^' masking 
function produced by the timing of either the single suffix or the 
second of two suffixes. DislnMbltlon was shown to occur, although It 
was weak In absolute magnitude.* ' . 

The purpose of this report is to ^ provide further information on the 
occufrei^ce of dislnhlbltlon in auditory memory. Disinhibition is 6 term that 
describes a particular experimental result that occurs when a second interfer- 
ing or masking event leads ^o better performance on some target information 
than would have been obtained with only a 'single mask. Crowder (1976) 
reported disinhibition in immediate memory after finding that a series of 
three suffixes, (extra words) following auditory memory-span lists led ,to 
better p^nformance on the last list item than did only a single suffix. This 
finding was interpreted within the framework of a. model for auditory memory 
that assumes a grid-like representation following rules for lateral inhibi- 
tion. IjB^ the sections that follow, other references td% disinhibition in 
psychology will be reviewed and then the Crowder (1978) model will be 
described. 

Disinhibition in Cognitive Psychology 

The theoretical and empirical status of diainhibitlon has been worked out 
very completeiy for the retinal- cells of the horseshoe crab (Hatllff, 1965). 
These retinal cells form ^ two-dlmen*sional grid irt which it is possible to 
deliver light stimuli to, and record electrical activity from, individual 
cells. Disinhibition is a property hf a certain form of lateral, inhibition. 
Therefore, the first step in explaining disinhibition is to describe how 
lateral iahibition works. 




♦Also in Memory A Cognition , 1982, JO^, 4?4-433. 
♦Also Yale University. 
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Figure 1. Nonrecurrent and recurrent lateral inhibition networks, 
considered the target and units B and C the masks'. 
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for lateiral inhibition, the important pattern of results is that the 
firing of a unit to stimulation is reduced lAen a neighboring unit is also 
being stimulated at the same time. This lateral* inhib£tio0 is expledned by^ 
the assumption that units send not only excitatory messages to the next stage 
of organisation, but also inhibitory messages to neighboring units at the samd 
stage. The degree of lateral inhibition is related to how far 'the tvo units 
are from each other. At very short distAices, the two units' activities seem 
to combine rathex' than to inhibit each other. At great distances,, two units 
behave independently, that is, one unit responds the same to its stimoalation 
whether or not there is another ac tive unit at a distance. The greatest 
lateral inhibition^lTlfj found at an intemediate spacing on the retinal mosaic. 
It is .not importantMrtiat these distances are in real unite; the important 
point is t^ie inverted U-shaped masking function based on the distance of the 
target and masking cells. \« 

' Figure f shows two fornis of lateral ijahibition for three hypothetical 
units. A, B, and C. These units are simultaneously emitting excitatory 
impulses (♦) to the next level and also inhibitory impulses (-) to each other. 
In *both types of lateral inhibition, nonrecurrent and recurrent, the firing of 
A will be reduced by the simultaneous activity of B. However, there is ^n 
important difference between th^ two inhibitory circuits, a difference that is 
fundamental t^ the concept of disinhibition . In nonrecurrent lateral inhibi- 
tion, the damiage to one unit caused by the other is not related to Iflow much 
the first unit has itself been inhibited. That is the, amount that A i's 
inhibited by B depends only on how active B is before "being inhibited by A. 
In recurrent lateral inhibition, the amount of da^nage that B- can cause A 
already reflects the damage that A has caused B. In other words,* in the 
recurrent mod ej , the inhibitory effect of one unit impinges on /a neighbor 
above the poiqt at which the neighbor branches out and sends inhibition back 
to the original unit.^ " ' * . ♦ 

Disinhibition" is a pr^operty of recurrent, but not of nonrecurrent, 
lateral inhibition. To see this, consider a third unit C, *in Figure 1,- 
connected to A and B according to either arrangement. - Assuming our interest 
is in the firing of Unit A, we can add activity in B, n.oting a. reduction in 
the activity of A. This is the case with either arrangement from Figure 1 and 
it establishes that A and B are related by some form of lateral inhibition . 
The nex t . question is what will happerf as a consequence of making the third 
unit, C, 'active. In nonrecurrent lateral inhibition, the activity in C will 
certainly reduce the output of B, but it will not influence the amount of 
inhibition coming from B to A. ^.This is byacause the inhibition fed by B to A 
has already been sent out before the unit C contacts B. In recurrent lateral 
inhibition, however,^ the activity of C will inhibit B before B has sent out 
its inhibitory^lnfluences. This meAns that C will reduce the ability of B to 
inhibit A. Thus, with recurrent lateral inhibition, a mask applied to a mask 
(C applied to B) should increase actfVity of the target (a). This is the 
defining outcomd for disinhibition. ^ 

The limited, scattered literature based" on these ideas in psychology 
encompasses three broad approaches to application of »the . model : 
electrophysiological,, theoretical, and behavioral. In the auditory domain, 
electrophysiological work by Galambos and Davis (l944) established analogues 
of the "receptive fields" that were latefr demon'strated by Hubel andi Viesel 
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Figure 2. Results of the Deutsch an4 Peroe.(1975) experiment. The perfor- 
mance measure is errors on "same" trials, given as a function of 
the tonal separation between tones. In the, lower function, the 
separation is between the standard tone (T1 ) and a single interfer- 
ing tone (12); in the upper function, the separation is between the 
first and second of two interfering tones (12 and 14, respective-, 
ly). " 
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(1962) in cats' visual systens. Ohe center-surround * organimtion of these 
netnorks includes the same logic outlined above and verified for the horseshoe 
crab retina. 

Theoretical explorations of the lateral- inhibition and disinhibition 
ideas have included abstract investigations of the mathematical properties of 
systens following the Rati iff (1965) equations (Bennan 4 Stewart, 1978) and 
also some psychological theorising. Milner (V957) f^und it necessary to 
include lateral inhibitory assumpttona in his realization of Itebb's cell 
assembly theory, for example. More recently, Walley and Weiden (1973) have 
offered a theory of selective attention deriving from concepts of lateral 
inhibition. * 

In hunan perception, there have been at least two areas in which 
disinhibition as the name of an experimental result has been observed. In 
visual masking, reports by Robinson (1 966) and Dember and Purcell (1967) 
established disinhibition in tachistoscopic researcfh. In one kind of experi- 
ment, a faint disk can be inhibited by a surrojjuding ring but disinhibited by 
a second ring that surrounds the first ringy^ There continues to be a lively 
interest in this phenomenon (e.g., Bryon A Banks, 1980; Turvey, 1973). 
However, an isolated report by Deutsch and Peroe (1 975) is most relevant to 
the issues at hand because it shows disinhibition within the domain of 
auditory short-term memory. The Deutsch and Feroe experiment will be consi- 
dered in isome detail in order to set the context for the present research. 

• • «^ 

The Deutsch and Feroe (1975) study . Deutsch and Feroe asked subjects for 
same- different Judgments on pairs of tone^ (a standard and a comparison tone) 
that were either identical or were .5 whole- tone steps apart. (A whole- tone 
step is equivalent to two keys on the piano separated by exactly one other 
key, without regard for black or White. In terns of hertz, the ratio of notes 
a whole-tone step apart is 1.125:1.000.) To make the task nontrivial , they 
interpolated six interference tones between the standard and the comparison. 
The Interpolated tones never came within 1.5 whole- tone steps of the standard 
in their baseline or control condition. 

In one experiment, the second of the six interference tonAs was allowed 
to come close to the standard and comparison tones, however, i This critical 
interference tone was either, in different conditions, the same as the first 
(standard) tone, or 1/6, 2/6, 3/6:,, 4/6, 5/6,, or 6/6 of a whole-tone step away 
from it. Thus, the second interference tone was deliberately made similar to 
the standatM and comparison tones. 

The results of this comparison can be described in terms of errors on 
"same- trials. When the critical second interference tone was identical to 
the standard (and also identical to the comparison, since only "same" trials 
are under consideration), performance was better than in the control condi- 
tion, in which all six of the interference tones were, from at least 1.5 steps 
away! In the other conditions, there was an inverted U-shaped masking 
function: Iterformance was worst lAen a 4/6 whole. tone step separa^ted the 
second interference tone from . the standard. When the separation was a whole 
toilb (6/6 step), performance was not different from the control condition, nor 
vas it different irtien only a l/6-step separation ims used. Uiese results are 
shown in the lower function of Figure In other words, the most interfer- 
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snce pocurred fit an, intermediate separation of the masl^ and standard target. 
This outcome fits the typical pattern for lateral inhibition, with most 
masking at an intermediate spacing of target and mask along some relevant, 
distance dimension.' Here, however, the, dimension is tonal distance rather 
than spatial distance on the retinal mosaic^ 

In the next experiment, puetsch and ^Peroe made both the second and the 
.fourth of the six interfering tones similar in pitch to the standard. For 
this arrangement, the standard, and the second and fourth interfering tones are 
being considered as a target and two masks. The second interfering tone was 
fixed at a 4/6-tone separation, the interval that produced the most interfer- 
each in the previous comparison. The fourth interfering tone in this new 
experiment was varied in pitch reiative to the second interfering tone in the 
same degrees used before: 0, 1/6, 2/6, 3/6, 4/6, 5/6, and 6/6 whole-tone 
steps apart. 

The logip^ behind the second experiment of Deutsch and Faroe was that the 
fourth tone in the intetjference series should mask the second ""tone in the 
interferenceyMries. Kiis^ masking should be strongest at the same separation 
(4/6 tone) that produced the strongest masking between the second 'interfering 
tone and the target. However, one ^cannot oljserve masking going on among 
interference tones directly. The only performance measure^ is the same- 
different response on the comparison tone. Provided the system operates 
according to recurrent lateral inhibition, however, there is a prediction to 
be made relative to ^rformance on the same- different task: The effect of, 
double masking (both the second and fourth of, the interfering tones) should 
occur in the form of disinhibition, with the fourth interfering tone producing 
better performance on the standard tone than would have occurred' with only the 
second interfering tone operating. This would be because the tourth 
interference tone would inhibit .activity of the second interferepce tone and 
the second interference tone would thereby be lees able to inhibit the target. 

Figure 2 (upper function) presents the Beutsch-Feroe results for the 
double masking conditions. Several aspects of the results are noteworthy. 
First, in general, having both the second and fourth interfering tonee close 
in pitch to the st.andard produced more errors than having only the second'one 
close in pitch. This^seems, on the face of it, to represent the ^oppositfe to 
disinhibition — two masks* leading to Worse performance than one. One might 
have insisted that disinhibition would be shownonly to the extent that a 
>double-masking condition led to better performmice than a single-masking 
condition. However, that conclusion *would be^ premature. The-real question is 
whether, when the distance separating the second interfering tone from the 
target is fixed at 4/6 of a wholes tone step, per formance gets better or worse 
when the fourth interferirig tone is set up to J^nterfere with the second. 
Thus, the relevant point from the single-mask curve is the one at 4/6-8tep 
separation, and that point is to be compared with those on the double-mask 
curve of Figure 2. Of the latter points, it is the 4/6-step separation 
between the second and fourth interfering tones that is of greatest interest, 
and fhare is an unambiguous absolute" disinhibition effect. Furthermore, the 
functional relationship between mask delay and per/orraance is precisely 
opposite for the double- and single-mask conditions. Whereas inhibition in 
the single-mask conditions was an inyerted U-shaped function of m^ask delay. 
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there-is a U-shaped function when one considers the timing of , the second of 
.^woltaska . . ' • , 

Deutsch and Peroe's experiment thus demonstrates disinhibition in audito- 
ry short-term memory. Presently, the result will be rationalized within a 
theoretical' context- that draws on ideas of , lateral inhibition from sensory 
psychoibgy, but this is a form of cognition that is obviously higher than 
the vretilia of. the horseshoe crab. The second lessbn of this experiment is 
that ti^#- impOrWt' signature of disinhibition, empirically, is as fcuch ^or 
more) >tiWe functional relation between performance and delay in sin^e- and 
. doublfe'^Wsk conditions as it is the simple observation of better perfds^iance 
with dOubleSthan with single masks. The experiments to be reported below are 
similar in IxSgic and design to the Deutscb and Yeroe experiments. 

A Mo(|el for Disiiihibiti&n in Auditory Memory . * 

Figurei 3 presents a schematic model based on assumptions made by Crowder 
(I97ff; see>also CrOwder, I98I , 1982). Tl^ grid symbolizes a two-dimensional 
memory l:epr§sentation for auditory events.* Entries are classified by time of 
arrival and by "channel." At- this point, the definition of channel retaains 
unclear. Words spoken by two different speakers would come over different 
channels, the more so if the two speakers were of different sexes. Wo^^^/^O" 
the same speaker, but located differently in auditory space, would be entered 
on different channels as well. Uhe channel separation of a speech ^o^d^^jd a 
nonspeech sound (tone) would be extremely large compared with differences 
among speech channels. Changes in pitch or stress from a singl6 speech source 
might or might not produce functional channel separation. Jn any case, it is 
quite easy to accept that a' single source remains ordinarily on one channel 
and that the classic operations (jr^se;L*tive attention for channel se paraxon 
(voice quality, location, ' and s-5'on) result in multichannel stimulation. That 
much grLted, there is no need at the present level of develoi^ent of the 
theory to be obsessed with the exact defining features of channel differences. 

The model assumes that distinctions . in time of 'arrival and channel are 
^r^gistered in a neurally spatial form,:.and that there is some senSe in which 
information arriving at different timed "goes to different places, as does 
infomat;Lon ^arriving over di^|ereht channels. This two-dimensional memory 
array obviously sets the stage Sforj applying 'the ideas of lateral inhibition, 
which de.part from the two -dimensional ariray formed by cells in the retina. 

So, far, th^ grid model specifies pniy! that an auditory event will produce 
activation of some kind at. the inte^ection formed by its arrival time and 
source channel. For the "representatiolto bfe useful in a functional sense, it 
should also provide information about |jiat^occurred at a part3=^ar time on a 
particular channel. As Figure 3 indicates, this problem is addi-essed by the 
assumption- that grid entries consist of c^ude spectrogr^s of the auditorjf 
event in question. The idea of a . sensory store holding spectral information 
for auditory events is also a feature of iKlatt's (1960) speech perception 
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Figure 5. A representation for auditory memories in two-dimensional neural 
space. Entries are classified by channel of entry and tim6 of 
arrival. The entries themselves are equivalent to rough spectro^ 
grams • 
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* Thus, Figure 3 modela the 8tate of the axiditory memory system following 
presentation of two steady-state vowel sounds distinguished by their second- 
fomant* frequenciest both occurring on the channel marked "signal" and 
occurring one after the other. It is assumed that entries like those Bhown in 
Figure 3 operate^ according to the rules of recurrent lateral inhibition. 
Specifically, this means ^ere should be an inverted U-shaped masking function 
relating the masking effect of one entry upon another as a function of th^ 
euclidian distance between them; distance either In source channel or in time 
of arrival. Furthermore, if the forH^ of lateral inhibition ia indeed 
recurrent, a second masking stimulus should degrade the first -masking stimulus 
in a way that produces disinhibited performance on the target iH;em. 

Application of the grid model to the 'Deutsch and Feroe (1975) experiment . 

^In Figure 4, the experiment of Deutsch and Feroe .is schematized in terms of 
the grid model. and T2'^ stand for the standard and coirtlparison tones, 

respectively;, 12 and 14 stand for the second (and fourth in the series of the 
six interfering tones (the other interfering tones were distant enx)ugti to be 
out of the picture). The only significant change is a simplification of the 
model^ to the effect that the dimension of pitch is substituted for channel. 
It seems reasonable that, .in a. context where only tones differing in ]4itch can 
occur, the t6no topic organization would stand for channel differences. ^^One 
can imagine the tone topic organization of Figure 4 as an expanded "blpwup of 
just one segment of the larger channel dimension represented, in Figure 3. The 
model of Figure 4 is simpler, furthermore, because the information contained 

1 at one of the grid .intersections need only be a unidimensional activation. In 
this sense, the analogy to the visual system is much closer: Information in 
the network is only that a particular location was active. 

From the. Deutsch and ^Fe roe tesult of Figure 2, it can be seen that pitch 
separations of 1/6 or 2/6 whole- tone steps lie within the integration zone of 
the representation of the standard. Separations of 5/6 or 6/6, on the other 
hand, lie beyond the reach of the lateral inhibitory connections. Shown in 
-Figure 4 are 4/6-step separations of the standard from the second interfering 
tone and of the latter from the fourth interfering tone. 

Application to \he suffix experiment * The major point in Crowder (1978) 
was application of the Figure 3 model to the stimul^ suffix experiment. 
Briefly, ,the reasonings is that each of the memory list items gets entered, as 
it is heard, at the appropriate intersection of arrival time and^ input 
channel. By the time the end of the list comes, a process called "damped 
oscillation" (Comsweet, 1970) will have* reduced the potency of the entries 
for the^early list positions, and, therefore, it is legitimate to restrict 
attention to the final end of the list. 3a^^ey and Hull (1979) and Engle 
(1980) have recently provided solid evidence that the last serial gp^ition is 
the only place to look, in modality and suffix experiments, for evidence 
relevant to auditory sensory memory. 

In the situations of interest here, the information all comes in over a 
single channel. (Although one could argue that different spoken items carry 
spectral information that^ varies like the tones of the Deutsch and Feroe, 
1975, experiment, the important channel determiner may be the speaker's 
fundamental pitch and not the changing formant structures.) In the control 
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Figure 4. The Deutsch and Peroe (1975) experiment described in terms of the 
model in Figure 3. T1 and T2 refer to the standard and comparison 
pitches, respectively. II, 12, ...16 represent the six interfering 
tones interpolated between T1 and T2. 
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condition of the ouffix experiment, there is either np item following the laet 
memory stimulus, or ^t he re is a recall 'cue on another channel (a buzzer or 
tone)" that is so far removed in' the grid that it might ae well *not hav6 
occurred for purposes of the auditory jtore. (Of course* that does not mean 
it is ignored by the subject. Two stimuli can quite well be out of reach on 
the auditory memory representation but wind up in a common working memory 
store.) Of the last few items, then, the final one should have an eepecially 
strong representation on^the grid because it is receiving inhibition from only 
one, rather than two, directions. When a euffix is adcled on the eame channel 
as the memory liet, it is the suffix that receivee the benefit of thie edge- 
sharpening process: Now the final memory item ie, like the other memory 
items, geUing inhibition from both of the two iteme neighboring It. 
MMnhibijtiOn occurs when a second suffit ie added after the firet euffix, for 
the reaeone explained above. . , 

The focus fppm which the present ree^rch derivee ie the set of 
pr^ictions for fehibition as a function of grid eeparation between the 1-aet 
memory item and one or two euffixes. Grid eeparation will be ope rationalized 
here as time separation rather than channel eeparation. There are quite a few 
published experiments on the timing of the euffix. Crowder (1978, figure 5, 
page 515) presented a compoeite graph from eeveral experimente varying the 
time^elay between the laet memory item and a eingle euffix from 0 to 2 eec 
The measure of ^performance was how damaging the euffix was to the laet memory 
item. The form of the overall function irae an inverted U, with maximum 
interference occurring eomewhere between .5 and 1.0 eec. We may conjecture 
that this is analogous to the lower function of Pigufe 2, the inverted U 
obtained by Deutech and Faroe (l9^5) for a eingle maek ae a fundtfon of ite 
separation from the standard. . The purpoee of the firet experiment in thie 
eeries was to demonstrate thie U-ehaped function within a eingle experiment 
and to' estimate the spacing at which a eingle euffix hae ite maximum *^f feet. 
»This eetimate can then be used to fix the firet of two euffixee and teet for 
. disinhibition as a function of the epacing between the firet and eecond 
euffixes. 

EXPERIMENT 1 

In this experiment, there were nine conditione, with parametric variation 
in the time separation of the last memory item from a eingle suffix. It wae 
expected from previous work (eee Crowder, 1978, Figure 5) that there weuld be 
an inverted U-shaped function relating the eize of the euffix effect to euffix 
delay. The purpose was to make a numerical eetimate of the inflection point 
of this function at which masking ie greatest. 

Method ' y'' 

Subjects . The subjects were 20 paid volunteers of college age. Moet 
wore Yale undergraduatee and 12 were males. 

Design . All subjects served in nine conditions, which varied according 
to the time delay between the last item in the memory list (nine digits) and 
the occurrence of the suffix "go." There were 90 trials, ten each for the nine 
delay conditions. These were randomized within blocks of^nine trials eo that 
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no condition repeated itself until all nine had jarccurred. Two versions of the 
experiment were prepared: In the second, the* memory items were exchanged on^a 
random basis, so that iriierever the digit 9 occurred in the first version, the 
digit 7 might occur in the second version^ for example, and so on. However, 
the order of suffix delay conditions was the same for the two versions. This 
meant that performance in a given condition wa^s based on a total of 20 
different nine-digit stimuli. ^ 

Materials. The nine digits, the word "go," and the word "ready" were 
recorded 'by a male speaker. They we're then digitized /by the Haskins 
Laboratories Pulse Code Modulation system and stored in coi^puter files. Other 
routines were; then available for^ sequencing these utterances in specified 
timing relations and s^if^thesising them on audiotape. These^ procedures assured 
that a given utterance sounded identical regardless of the list t)r experim^- 
tal condition in which it occurred. ^ Ebcperiments of this sort are^asically 
impossible without these precautions, for the prosodic output %>f a real-time 
speaker TTs quite likely to be affected by the same variables as* those tested 
as experimental manipulations in suffix' experiments. 

Each of the digits and the word "go" were placed in a 300-msec frame in 
such a way as to be roughly "P-centered" (Morton, Marcus, A Prankish, 1976). 
No effort was made, however, to correct the natural tendency for some dibits 
to be spoken faster than otl^ftrs, so there wi^s some variation among them in th^ 
amount of silence. A iOO-msec gap was pieced between all adjacent items 6n 
the test tape. Thus, it sounded as if the list were being spoken rhythmically 
at a r^te of'6CX) msec/item. 

A trial began with the word "ready," followed by a gap of 500 msec, and 
the nine digits, set at a stimulus onset asynchrony of 600 msec. T|ie stimulus 
onset asynchrony* of the ninth memory item relative to the suffix was varied in 
100-tosec Steps' from 100, to 900 msec. To accomplish this, the memory items 
were recorded on one channel and the suffix item was recorded on the second 
channel of a stereo tape recorder. Fifteen seconds were allowed after the 
suffix, for written recall, before the next ready signal occurred. 

Procedure . The st4.muli were presented to subjects who were tested in 
small gro.ups (one to ^five individuals) over loudspeakers placed^ at different 
sides of the room. How loud the materials seemed depended on where the 
subject sat, as did, to a slight extent, the relative loudness of the memory 
ittems and suffiies (see Crowder, 1978, for data 6n the importance of these 
factors). In any case, the memory items and the suffix were on "different 
channels" with respect to the model, of Figure 5. 

.The instructions called for written, ordered recall. The subjects were 
toli(i('; thdt the suffix "go" was a signal telling them when to write down the 
nin^ldigits. Opposite each trial number was a^set of nine blanks that were to 
be filled in from left to right, with no backtracking. If the subject failed 
to remember what went in a position, he or she was to, draw a dash in that 
space. There was a 2-min break halfway through the session. 
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Figure 5^. Perfornance on the ninth serial position as a function of delay for 
a single suffix (lower curve) or delay for the second of two 
suffixes (upper curve). In the latter case, first suffix was 

fixed at a delay of, 500 msec. 
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ResultB ^ 

Figure 5 shons the results, in the lower curve , in each panel, markeSi 
"single." In the upper panel are the normaliued proportion of errors on the 
finMl serial position. For each subject, the errors on the last position in 
each condition vers divided by the total nioaber of ejrrors made on all 
positions in that condition. In the lower panel, the raw number of final- 
position errors is given. The normalised errors are the more analytical 
scores, because t]iey discount the - operation of variables that influence 
performance all across the list rather than Just at the end.* The layout of 
Figure 4 (and of the others in this series) is different from that used by 
Deutsch and Feroe (1975) (see Figure 2) only in that the double- maAk curve has 
been ahif^ted to the left in order to lie over the single-mask curve. 

Clearly, "the single-suffix data show the predicted, ^inverted-U form, with 
the largest effect at an intermediate delay. An overall one-way analysis of 
variance was conductd prior to testing for trend. The result is given in the 
first /row of Tdble 1 in the. column labelled "Overall F." ^ In fact, the 
reliability of this analysis was borderline, K8J52) - 1.92, HSe - 3225-4^ ^ 
< .10; however, a glance at Table 1 shoWs that Experiment 3 of the present* 
sflries^elded a reliable F for this particular comparison. Furthermore, tJ» 
obtained function was the one predicted. Trend analyses of the first four 
degrees are also shown in Table 1, where it is seen that the expected 
quadratic comuponent was highly significant. The best- fitting quadratic func- 
tion, obtained by a least squares method, is shown ip the upper panel of 
Figure 5 for these data* The fitted function reaches a maximum at 548 msec. 
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Discussion 

The results of Eiperiment 1 giv^e^he information needed f>or continuing 
with double-suffix conditions: The quadratic function relating single-suffix 
delay to performance Was amply confirmed with a relatively small number of 
observations. Although there was no need for a no-suffix control in this 
experiment, the obtained magnitude of the suffix effects in Figure 5 was 
rather smaller than that found in studies using comparable techniques. This 
is almost surely a result of having placed the suffix and memory items on 
different loudspeakers and having given them different spatial sources. This 
cliannel separation is expected, from the 'model of Figure 3i to reduce the 
suffix effect overall. It was built into the design of Experiment 1 in order 
to minimize direct, integration masking of the memory item by the suffix (see 
Crowder, 1978).^ In any case, the magnitude of the suffix effect was not at 
stake here, only its dependence on the suffix delay. 

EXPERIHENT 2 

* / 

The second experiment used two suffixes, the first fixed at a delay of 
500 msec in all conditions. The purjfose was to see whether the relation 
between second-suffix delay and performance would be a mirror reflection of 
the single- suffix performance, as woul^ji be expected from the disinhibition 
assumption. The second suffix was presented at the same nine stimulus onset 
asynchronies (100, 200, ... 900) relative to the first suffix as were used in 
Experiment 1 to separate the single suffix from the last meafaory item. 

Method ' . 

The experiment was similar ^in all details to Experiment 1 with the 
following exceptions: The n was increased from 20 to 30 subjects, 19 of whom 
were males (from the same source as Experiment 1). There were three versions 
of the same 90 memory trials, produced by isomorphic mapping of individual 
digits from one version to the next. Ten subjects received each of the three 
versions. Finally, the word "go" was said twice at the end of each list, the 
first time at a stimulus onset asynchrony of 500 msec and the second time at 
one of nine stimulus onset asynchronies varying in 100-msec steps between 100 
and 900 msec. The memory stimuli and second suffix were recorded on one 
stereo channel and the first suffix on the other. As in Experiment 1, the two 
channels were separated by means of loudspeakers placed on different sides of 
the experimental room. Keeping the memory items and the first suffix. on 
separate channels was intended to reduce integration masking of the last 
maaory item, that is, masking through a process of simple "drowning out." It 
will be seen in Experiment 4 that these channel differences turned out to be 
inconsequential in the present type of experiment. 

Results 

Figure 5 shows the results of Experiment 2 in the upper^-fiunotions of both 
panels. A statistical summary of the outcome is in Table 1, second roW. The 
overall F was statistically reliable < this experiment, indicating 

that tl>e normalised errors on the last position were significantly affected by 
i;he placement of the second suffix. The form of the function is weakly curved 
in the iftirror image of the single-suffix function from Experiment 1. The 
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reliability of the quadratic trenji in Experiment 2 was just short of the .05 
level of confidence, £ • 3. 61 , ^ < . 10. The bes.t- fitting quadratic function 
is written in Figure 5 for the data of Experiment 2. It is notable that the 
Minimuja of this function is very close to the maximumS'of the function from 
Bxperiment 1 (557 vs. 548). ^ 

Discussion 

Btsr and large, these data fall iri^o the predicted pattern for recurrent, 
lateral inhibition. Two features of these data are worrisome, howeve^r: 
First, there was no "absolute" disinhibition .in the sense that two suffixes 
led to better target performance than one. Second, the resultp from the first 
two studios were not statistically impres^iv^. When the quadratic, trend was 
reliable ^Experiment the overall F for conditions was not, and where the 

latter was reliable, the trend fell just short of statistical significance. 
For these reasons, further data were collected with very similar experimental 
procedures. 

EXPERIMENT 3 

•w — 

The third experiment combined Experiments 1 and 2 into a single design. 
*The same stimulus tapes were used ^ in the earlier studies, but the two 
loudspeakers were placed side by side, so that all materials came from the 
same apparent source in both conditions. Thirty-six subjects received the 
single-suffix tape and another 36 received the double-suffix tape. Within 
each condition, there were three mappings of individual .digits into the basic 
schedule o^ memory items. ' # 

Results 

Figure 6 shows the results of Experiment 3f plotted the same way as those 
of Experiments 1 and 2. The statistical outcomes are summarized in Table 1, 
third and fourth rows. In the single- ^f fix condition, there was a highly 
significant overall F for conditions and significant trends for linear through 
cubic degrees. The best-fitting quadratic function is shown in the figure; 
its maximum is 646 msec, which is slightly less than' 100 msec different from 
the maximum for the function fitted to the single-suffix conditions of 
Experiment 1 . ^ 

The results for the double-suffix conditions of Experiment 3 are much 
less impressive. There was no reliable overall effect of second-suffix delay 
here, nor was any trend component close to reliability. However, Experiment 3 
did show reliable absolute disinhibition: On Positions 6 and 7, performance 
was significantly better with two suffixes than with one, Jb (70) - 1.93, ^ < 
.05. The present experiment is a more appropriate place to look for absolute 
disinhibition than Experiments 1 and 2 because there was no confounding 
between suffix number and suffix location and because the subjects were more 
closely comparable, at least in time of testing. In fact, the results of 
Experiments 1, 2, and 3 are really quite cbmparable if one looks at 
disinhibition as measured by the difference between normalized last- position 
errors in the single- and double-suffix conditions. Such data are shown in 
Figure 7. The correlation between these two sets of points is -^.56, which 
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Figure 6. . Results for the single- and double- suffix conditions of Experiment 
3, plotted the same way as in Figure 5. 
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shows that both data sets are reliable, and that there is considerable shared 
variance between them*^ 

The restriction of diainhibition to a narrow range in timing of the 
. single- and double-masking events is consistent with what was found by Deutsch 
and Peroe (1975). In their experiment, absolute disinhibition was obtained ^ 
only > when the maximum of the single- mask condition was comjpared to the minimum 
of the doub:^e-mask condition (see Figure 2). This also raises a note of 
caution for experimenters seeking to replicate the effect: Unless these time 
intervals are delicately calibrated, it is quite likely that- one will miss the 
phenomenon (e.g., Watkins & Watkins, 1962). 

EXPERIHEHT 4 ^ 

I. ^ » • 

The precarious consistency of the statistical evidence from Experiment's 
1, 2, and 3 raises still another danger. Perhaps the pattern of Figure 6 is 
coming entirely from the single-suffix conditions, with the double-suffix 
conditions serving as litt3.e more than baseline controls.' The significant 
overall F from Experiment 2 and the associated quadratic trend jffould be 
considered Type II errors from this viewpoint. Tbe final experiment in this 
series was aa effort to determine whether a U-shaped masking function, with 
reliable quadratic trend, is "really there" in double-suffix experiments of 
this type. ;t was also intended to clear up whether diversity in the spatial 
sources af the two suffixes makes a difference. In the double-suffix 
conditions of Experiment 2, the first suffix was on the opposite channel from 
that which had carried the memory stimuli and the second suffix returned to 
the stimulus channel. In Experiment 3, however, all information came over a 
single channel. 

Method 

The method of Experiment 4 was identical to those of the first three 
experiments except^for the following points: Sixty new subjects were used, 50 
in each of two groups. In both groups, there were always two suffixes. One 
group corresponded to the spatial arrangement of Experiment 2 and the other 
group corresponded to the spatial arrangement of Experiment 3. 

Results " 

An overall analysis of variance with spatial location of suffixes as one 
factor (single versus double source) and second-suffix delay as the other 
showed no main effect of spatial location or interaction of spatial location 
with second-suffix delay, F < 1 .0 for both.' Therefore, the two spatial 
arrangements hare been combined for all subsequent analyses, making this a 
single- factor, nine- condition experiment. Figure 8 shows the results for 
normalized last-position errors in the upper panel. The raw errors are shown 
below, with the single- suffix conditions of Experiment 3 added for Comparison. 
The last row of Table 1 shows that the overall effect of second-suffix delay 
was statistically reliable and that the only reliable trend component was the 
quadratic one. The best-fitting quadratic ftmction hai" a minimum at 408 msec'. 
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Diacuaaion 

The reaulta of Eiperiment 4 confiiraed that the U-shaped masking function 
for aecond-auffix delay is not wishful thinking or a false positive. If one 
wiahea to make the comparison ahown in the lower panel of Figure 8 between the 
conditions of the present experiment and the most comparable single-suffix 
conditions available in this series, there is also ample evidence here for 
absolute dieinhibition . These* are the two hallmarks of disinhibition~the 
mirror reversal of the masking-delay function and the occurrence of absolute 
diainhibition. There seems to be no reason for retracting Crowd er' s (1978) 
hypothesis that suffix experiments can be explained within the grid model and 
that it is a form of recurrent lateral inhibition that seems to relate entries 
on that grid. 

GEMERAL DISCUSSION 

The results of these four experiments differ in statistical impact and 
the fitted functions from them show different idiosyncracies. However, a 
common theme in them is the predicted quadratic trend. The minima and maxima^ 
of the best fitting • quadratic functions show a reasonable convergence on 
something in the neighborhood of .5 sec as the critical spacing for the 
strongest lateral inhibition on the grid. There is also some evidence for 
absolute disinhibition in comparisons of performance in single- and double, 
suffix conditions. . j ^ 

It could be objected that the results of these experiments depend somehow 
on using normaliMd errorslLpn the final position as the main response measure. 
Watkins and fetkins (1982) have taken strong exception to this practice, for 
example. One worry might be that the suffix(es) could be affecting items more 
than one back in the series and, if so, part of the experimental effects might 
be serving in the normal i eat ion background. If so, the argument goes, one s 
response measure would be tampering improperly with the effect itself. There 
are many considerations on both sides of this issue. Bather than to weave 
through these argments here, an alternative data analysis is offered in Table 
2, which corresponds exactly to Table 1, except the raw error frequencies on 
Rjsition 9 were used instead of the normalized proportions. Tho two analyses 
show much the same picture. The result of Experiment 2 was not as strong with 
raw as with normalised errors, and the ancmalous result of the double-suffix 
conditions in Experiment. 3 was pushed over the criterion of reliability with 
the new measure. However, the all-important finding of Experiment 4, which 
established the U-shaped quadratic trend for double- suffix conditions, was 
just as convincing in Table 2 as in Table 1. Thus, although normalized errors 
are still the preferred performance index, the conclusions of this research do 
not change if an uncorrected measure is used. 
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Table 2 

Statistical Auofflarlea of Btperiments 1-4: Raw Error Frequencies 



Esperinent Ovrerall F 

' and Condition (1-Way AHOVA) 



Trend Components 



Linear 'Quadr. Cubic Quartic 



BcperiDent 1: 

Obe suffix 
Experiment 2: 

Tvo Suffixes 
Experiment 3: 

Qae Sulflx 
Experiment 3: 

Tiro suffixes 
Experiment 4: 

Tvo Suffixes 

•p < .05 



F(8,152)-1 .86, p < .10 
F(8,232)-1 .21, n.s. 
F(8,280)-8.'83, p < .0005 
P(8,280>5.06, p .0005 
F(8,472)»2.32, p < .05 



.44 12.92* .24 
.45, 1.12 1.23 
22.18* 21.94* 16.96* 
8.16* 7.21* .21 
4.46* 4.87* .20 



.56 
.00 
.23 
.39 
.22 



These experiments show that it is not easy to obtain absolute dislnhibl* 
tlon. Odly when the timing relations of the two suffixes were exactly right 
did the double-suffix condition lead to improved' last- item recall. It would 
not be surprising if other investigations (Watkins & Watkins, 1982) would have 
poor luck showing disinhibition if they used only one set of target-mask and 
intezmask delavs. Also, it should be noted that the original demonstrations 
(Crowler, 1978; compared one with three suffixes, whereas the present studies 
compare one * with two. The mathematics of recurrent lateral inhibition 
networks are compleS enough that it is not ^bvlous what the relation should be 
of double- and triple-masking conditions. In^ the absence of a formal simula- 
tion of these outcomes, it remains possible that our understanding of 
disinhibiton is Incomplete in this way also. 

The magnitude of disinhibition is quite small, however, in these experi- 
ments^ It would be highly risky to use the amount of disinhibition as an 
indicatpr of anything else. Bather, the Importance of suffix disinhibition is 
to settle lAlch type of lateral inhibition, recurrent or nonrecurrent, is the 
one to use in formal modeling based on the ideas of Figure 3. ' 

Does disinhibition in the auditory system carry implications that go 
beyond the t^alm of formal models? It seems likely that a system with the 
machinery for a sort of temporal edge- sharpening would indeed be important in 
domains such as speech perception and music • However, these applications 
should be accomplished with the overall model rather than with the Specific 
assumptions' connected with disinhibition. 
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LETTER TO THE EDITOR, Journal of Phonetic^ 
♦ 

Leigh Uskei^ 



Dear Sir, 

# 

Bie-idea that a notational device can of itself explain a body of 
observational data seens to be held by certain linguists. I have in mind, 
specifically, a recent paper by telsh and Parker ( journal of Phonetics , 1981, 
9i, 305-308), which takes Raphael ( journal jof Phonetics , 1975, 3, 25-33) to 
task for presuming to describe a .physiological finding of his as an explana- 
tion for the greater length of vowels before voicedMhan before voiceless 
consonants in Baglish. T>iey advance, instead, the (to me) curious view that 
this greater duration ia explained by calling it an effect "triggered" by an 
"abstract" nropertv of the phonological set /b,d,g,.../* Since this abstract 
property, [♦voicejf is said by them to have "a nuaber of acoustic and 
articulatory correlates" (p. 306), one of them no doubt the longer vowel 
duration, this so-called explanation is quite circular. Raphael's study, 
seriously misrepresented by Walsh and Parker, aimed to find out idiether the 
vowel length difference is attributable to a difference in the "motor command" 
for the vowel, to a difference only in the relative timing of vowel and 
consonant "commands," or to some combination of the two. It was, in Raphael's 
words, designed to investigate "the physiological activity which must underlie 
durational differences, no matter what their cause " (p. 25; emphasis added by 
LL). Fbr Walsh and Parker, however, it seems that to name is to explain. 
Only thus can we understand what they mean Mhen they write, in the inflated 
style fashionable among linguists, that the abstract [tyoice] feature "pre- 
dicts" relative vowel duration, ^ . 

Hot only does either an abstract or an observable [tyoice] feature 
dimension jaot explain vowel length variation, but it is surely prejudicial to 
assune that it is the longer vowel before /b,d,g,.../ that needs expl/feining 
rather than the shorter one before /p,t,k,.../, or that it is appropriate to 
deal with vowel duration without attention to the correlative consonant 
duration. Walsh and Parker are entirely correct idien th6y emphasiee that the 
[♦voice] feature as conventionally defined by phoneticians is inadequate for 
identifying obstruents as members of the /b,d ,g , . .*^./ and /p,t,k,.../ sets. 
Ih is long recogniaed fact is what motivated the once prevalent view that the 
two sets are more reliably distinguished by a difference of articulatory force 
([ttense] or [^fortis]) than by one of voicing. Since a vow^l is longer 
before a voiced consonant belonging to /b,d,g,.../, it may be that we learn to 
pronounce the longer vowel even before a "devoiced" consonant assigned to the 
same set, i.e., a consonant that maybe otherwise identical phonetically with 
an abstractly and observably [-voice] consonant of the /p,t,k,.. ./ set. Bie 
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fact that a vovel is longer before a voiced consonant does not imply that a 
vonel is longer only before a voiced consonant; vowel length differences d^o, 
after all, function distinctively in many languages. It may be true, on the 
other hand, that even in languages irith distinctive vowel length there is a 
connection between vowel duration and consonant voicing. 

AflsuBing that we are reluctant to regard the vowel duration difference 
that is preserved despite the voic^lessness of /b,d,g,.../ as a case of a 
phonemic split in progress, we * may speculate that vowel shortening is an 
effect of , the devoicing gestures associated with /p,t,k,.../, while the 
devoiclng of /b,d,g,.../ has a very different etiology. This might be- the 
case, in particular, where /b,d,g,.../ are phonetically voiceless even though 
adjacent to intervals associated with voiced segments. In such a context 
voicelessness could result from a cessation of glottal airflow with no change 
of the lar'ynx from a [♦voice] state. In that event physiological data would 
have an explanatory value not possessed by either acoustic data or by the 
abstract [♦voice] feature. Moreover, it vould make more ^understandable idiy 
listeners label some consonants b, A^f £$••• despite their voiqelessness,^ and 
nhy linguists prefer to transcribe them phonetically as [ b^d ^g^^TTTj^ rather 
than [p,t,k,...]. 
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IS IT JUST RSADIIQ? CONMBITS OS THE PAEBIS BY MAim, MORRISON, AND VOLFORD AND 
FOWLKS* 



Robert G. Crowder* 



/ 

Vfy comments on the stimulating papers by Mann (in press), Morrison (in 
pridss), and Volford and Fouler (in press) come under four headings. First, I 
identify their differenoes with respect to the organiBing theme. Second, I 
discilas the central difficulty, for theories of reading disability, posed by 
the high cocrelation betlreen reading and IQ, and ways of dealing with this 
thtfflS^iLtjpr In the third and fourth sections, I comment on the individual 
papers and summarise what I think are the main lessons to be learned from this 
collection. - ^ 

How the Papers Differ 

One prucial question posed^ in these papers is whether the disability 
alpiown by popr jejidfrs is more general or less general than the process of 
reading itself. If one thinks the problem with disabled readers lies with 
letter perception, then one has implied the problem is less general than 
reading; if one thinks the problem is in low IQ, then one has implied it is 
more general. Of the three participants, Mann (in press) has identified 
herself and her colleagues at Raskins Laboratories with the "less general" 
point of view. Kieir position is that it is a subcomponent of reading that 
holds back the typical poor reader-- his or her inability to achieve and 
maintain a phonetic code for short-term memory. This is not to say that the 
defective phonetic coding does not compromise^ other processes than reading; in 
research that I shall mention again below, Brady, Shankweiler, and Mann (in 
press) have shown that' phonetic perception in the auditory mode is also 
differentially impaired in poor readers. 

Morrison (in press) euad Wolford and Fowler (in press) think the typical 
problem with reading disability is more general than the reading skill itseif. 
The former attributes the problem to difficulty in the learning of irregular 
rule syaitems, of which the especially relevant example is the set of 
correspondences between graphemes and sounds in English. Wolford and Fowler ' 
(in press) attribute the problem to difficulty in generating a response on the 
basis of partial information. These two mechanisms are quite obviously more 
abstract than a specific, phoneti encoding deficit. 



♦in press. Developmental Review . ♦ 
>Also Yale University. 
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A second dimension of variation among' these three papers is whether they 
assign the reading deficit to a prooress that is specifically linguistic or 
not* There iq no reason that our three authors should have assorted 
thMselves in the same vay on these issues as on the first but they do: 
Hann^s endorsement of phonetic coding as the major problem puts her quite 
clearly in the linguistic-deficit camp, whereas Morrison and Volford and 
Fowler, have chosen, more abstract cognitive deficitfi* 

A third 'dimension of variation is on the )l^atter of who ctxactly consti- 
tutes the impaired- reading population we are concerned about. Mainly, the 
question Is whether or not to consider IQ differences as an inherent part of 
reading disability* I would have expected more discussion of this very 
impoi^tant point than I found in these papers. Wo If ord and Fowler alone come 
out Und face the issue head-on, in a refreshing survey of IQ differences 
between groups of good and poor readers "matched" on IQ, Even with deliberate 
matching to remove this confounding, the vast majority of studies do sjiow an 
IQ advantage for the good readers; Volford and Fowler conclude that the 
association is an inescapable one. In the opening paragraphs of his contribu- 
tion, Morrison assumes the opposite position. So does Mann, by virtue of the 
effort she and her colleagues have made to exclude IQ differences from 
good/poor reader comparisons. This issue sets the stage for the next section 
of my own paper: 

What to Do About IQ Differences 

As an impressionable teenager, I learned from the Instructor in my 
undergraduate tests-and-measurements course a powerful law of psychology: 
"All good things go together." The correlation between reading^^rformance and 
,IQ ranges from around .50 to .60, in unselected lower school poVilations, to 
over .60 in high school (Sternberg, Note 1). In view of thisAcorrelation 
between reading and IQ, nothing could be less interesting tbg^ to select' 
children on the basis of high and low reading ability alone and^to show them 
different on one* a favorite information-processing measure. At the very 
least, the skills used in reading are only a tinv subset of the skills that 
contribute to IQ scores. Since all the. skills Vlll tend to go together in 
unselected populations, Vit should not be surprising that one predicts the 
other. . ^ 



If one takes seriously the definition of* reading that distinguishes it 
from language comprehension over the i oral-auditory channel (auding), then 
reading skills are hot only a tiny, but also a very specific subset of all the 
skills that are measured on the 'major IQ tests. That is, when a reading test 
measures comprehension, we would not want to say that a low-scoring individual 
Is a "poor reader" unless we know that his or her comprehension in reading is 
poor in relation to his or her comprehension of the same material in auding. 
With tests of reading that mix in ability to comprehend language— written or 
spoken—it is indeed a thorny problem whether the IQ test is fundamentally 
different from the reading test at all, or Just a larger set of cognitive 
skills. If the proper distinctions among reading, auding, and comprehension 
are made* however, these tests would not properly be used to identify poor 
readers. 
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The Issue that lies behind these commonplace observations Is not an easy 
one: If our definition of reading disability Is to exclude Intelligence as a 
factor, then does this mean that children of low IQ are Ineligible to have 
reading' disabilities? What are ^we to do with the fact that tests of IQ in 
some cases make use of reading sklllSt and vice versa? What of the fact that 
the mliture of skills tapped by both reading and IQ tests changes as one goes 
from age six to sixteen?** These matters are the subject of several searching 
analyses in the opening section of the edited collection by Benton and Pearl 
(1978). For the moment, we can agrbe that one's research strategy should 
differ depending on whether, like Mann and the Haskins group, one considers IQ 
a potentially troublesome covarlate of reading or, like Wolford and Fowler, 
one considers reading to measure a fundamental comp^ent of IQ* 

What can be done if one wants to investigate reading disability with IQ 
held constant? I think there are four solutions, and variants on them, that 
have been used: 

1. One can take pains to match good and poor readers on IQ. This is the 
most popular control method and the most worthless. For one thing,' Wolford 
and Fowler (in press)* have demonstrated that the "matching" doesn't work — 22 
out of 23 studies they Inspected showed that the good readers were smarter, on 
the average, than the poor readers. The sl»e of the numerical IQ difference 
between groups is Irrelevant, as is the fact that the difference is typically 
nonsignificant* The nonsignificant difference is to be expected if some group 
IQ measure with low reliability is used, or if there are few subjects, either 
or both of which circumstances are often the case. The else of the obtained 
group difference in IQ is not relevant in view of the potential regression 
artifacts that exist. This regression artifact is the really telling argument 
against matching. The problem is of course that tests --of IQ are less than 
perfectly reliable. This means that some of the children scoring high are 
really high by accident and would score lower on another round of testing; 
likewise, some of the children scoring low are really closer to normal than 
their score indicates, and would get a higher score on another round of 
testing. If, Instead of administering the IQ test again, we administer a test 
of something that is correlated with IQ, such as a reading subsklll, we would 
expect the children whp were "accidentally too high" and "accidentally too 
low" to move back towards the overall population mean. In order to match 
groups of readers on IQ, it is necessary to take good readers who have low 
IQ's for their group and poor readers who have high IQ's for* their group. 
(This is because the traits are so highly correlated in the general popula- 
tion.) What matching does is virtually to guarantee that the scores -of the 
good readers will improve on any measure that is correlated with IQ and that 
the scores of the poor readers will go down for statistical-reasons alone. 
Thus, one can go through life testing good and poor readers on information- 
processing skills and, as long as these skills are related to IQ, one will 
always find good readers doing better than poor readers. 

2. Another remedy for the IQ-reading correlation is to use a controli 
task of some kind and show that good and poor readers do hot differ on it. 
This method is referred to as convergent-discriminant validation in testing 
circles. The presumption behind this strategy is that this control task does 
not tap into the reading skill but that It does correlate with IQ. In such a 
ci|8e, the contribution of IQ could be discounted as responsible for the 
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differences observed In the tvo reading groups. Just anything cannot be used 
as a control task, of coursjs: If, for example, the control task Is so easy as 
to produce a ceiling effect, ao difficult as to produce a floor effect, or so 
unreliable as to be insensitive to anjrthing, then it is no good as a control 
task. 

Brady et al. (in press) have given us a good example of the control- task 
strategy that avoids these pitfalls. They were interested in the possibility 
that reading ability is related to the ability to achieve. a phonetic code from 
speech , as veil as from print. They found that Identification of phonetic 
segments vas equal for good and poor readers when the intelligibility of 
speedh was high; however, when^masklng ' noise was added, the poor readers 
suffered a significant impairment relative to the good readers. The special 
strength of this experiment was in a control task in which the sounds to be 
identified were naturalistic, non- linguistic sounds. The addition of noise to 
these sounds reduced performance to the same level it had for speech sounds; 
however, the amount of tlA.s reduction was the same for good and poor readers. 
Thus, we may conclude that it ia the processing of linguistic segments that 
discriminates good and poor readers, not Just general auditory identification.^ 

The control-task methodology can be useful when wisely applied^, but it is 
no panacea. There remains the danger that the control task chosen, even if it 
is of comparable difficulty to the ostensibly reading- related task, is not 
sharing much variance with IQ. In the Brady-Shankweiler-Mann study, for 
example, the reasons why adding noise to speech damc^ed speech- perception 
performance might not be the reasons why adding noise to naturalistic sounds 
.damaged performance on them. By way of an analogy, to include tying of 
shoelaces as a control task in reading research might be an empty experimental 
gesture even though there can be not the slightest doubt it is correlated at 
least with mental age. 

3. A third way of dealing with the IQ-reading correlation is to accept 
the confounding of good and poor reading-group differences 'with IQ, at face 
value, but to. show that it could not be responsible for the obtained results. 
Say a particular pattern of data is obtained when subjects are split into 
groups on the basis of reading ability;^ perhaps the good readers show phonetic 
confusions but not the poor readers, liie danger is that IQ could somehow be 
responsible for this pattern. The remedy suggested here is then to split the 
entire group of subjects by IQ, pooling together the good and poor readers. 
If IQ is I'esponsible for the reading- ^oup difference i then the same pattern 
should appear in this second analysis. That is, the high IQ subjects would, 
in this case, show the evidence for phonetic coding. If that is not the 
result, however, if the IQ split produces no differences in phonetic coding, 
then we may be assured that our original observation should not be rubbished 
by an IQ-regression artifact. - Mark, Shankweiler, Liberman, and Fowler (1977) 
have used just this technique in one of their experiments. 

There are cautions that go with thi^s method, of course. If we select for 
low and high reading performance, we almost guarantee that a subsequent split 
on the basis of IQ will produce a relatively restricted range (again because 
of regression). If the resulting partition of subjects on IQ produces a weak 
but nonsignificant copy of the reading split, there is no protection at all. 
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4. The best means of dealing with tlie IQ-reading aesociation is probably 
atatistical control* One simple solution is to use^IQ as a covariate in 
assessing the influence of reading ability* Nann and \er ^colleagues report 
several uses of this in her (Nann^ in press) paper. Fancier techniques are 
possible: Vith adequate prior measures of IQ» reading performance, and other 
predictors, as well: as criterion measures on the information- processing task 
of interest — all incliiding reliabilities — one is in a position to tease^ out 
the operating relationships with multiple- and partial- coripelation methods. 
Good examples of this approach are beginning to appear (JacKson A HcClelland, 
1979). A recent paper by Perfetti, Beck, and Hughes (Note 2) carries this 
type of analysis still further: These three investigators employed the logic 
of causal analysis, through time lag correlations, to face the issue of which 
component skills "enable" (their term) the later reading skill. Thia kind of 
approach means testing more subjects, for picking extreme groups allows one to. 
eliminate intermediate cakes. But the extra cost of testing more subjects^/4^a 
small compared to the cost (ff turning in results that cannot be interpreted. 

The Mann Paifbr 

In her pep^r, Mann (in press) continues the careful effort by investiga- 
tors at Haskins Laboratories to associate reading disability with processes at 
the, poonetic level of the spoken language. As she herself states, and others 
have increasingly concluded (see Crowder, 19B2, Chapter 9), it is dubious that 
the speech- based process in reading has much to do with lexical access* 
Rather, the interest is in a phonetic short-term memo^ system that would hold 
verbatim information pending higher- level linguistic processing. Readers are 
thus hypothesised to use speech in "...reading situations where sentence 
structure is at stake... when their task involves recovering the meaning of 
written sentences and not^ sim^ply words alone***" (Mann, in press). My 
comments on Mann's paper concentrate on this hypothesis from two points of 
view—whether sentence- lev el comprehension really does depend on a verbatim 
short-term memory and how we should interpret the association of this short- 
term memory with reading disability. 

First, however, I want to acknowledge the sensitivity shown by .Mann and 
Haskins workers to the IQ issue, which I Just finished discussing. In most of 
their recent studies, Mann and her colleagues have applied either an appropri- 
ate statistical adjustment ( covariance analysis) to rule out an IQ interpreta- 
tion of the advantage shown by good readers, or have shown that an IQ split of 
the subjects does not produce the pattern of interest (solution number 3, 
above). It is to be hoped that the work- in- progress done in collaboration 
with Shankweiler and Smith (Mann, in press) will receive the same thorough 
treatment. 

Is a phonetit5 (verbatim) short-term memory really necessary for under- 
standing what sentences mean? The rational argument for this hypothesis is 
compelling: The language has many distributed forms, for example, auxiliaries 
separated from their main verbs by considerable distances; it seems preposter- 
ous that each word could be processed "all the way up" as it is encountered in 
the stream of print or speech. This consideration is so compelling I still 
believe it, deep down, despite recent evidence that it may be ilrong! 



ERLC 



311 



32o 



Is it Just Bssding? 



Hanj of ua have taken it for granted that the short-term memory that 
serves language comprehension in this way would be phonetic, that is to say, 
capable of holding the vords themselves at the segmental level for later 
analysis. Levy (1978) reports research that is highly troublesome for this 
assumption: Her technique was to present an articulatory distr^tor (count-? 
ing) along with the visual presentation of three sentences^ The measure was 
subsequent discrimination of these sentences from ^ other sentences with seman- 
tic or ^ical modifications. The basic finding was that recognition of the 
ssntencel was reduced considerably by the simultaneous articulatory distrac- 
tor» a result that suggested- that the distractor task incapacitated the 
phonetic short-term memory system that is necessary for reading. The problem 
comes in another study, in which the memory measure was discrimination of true 
and false paraidiraaes of the presentation sentences* Here, verbatim informa- 
tion was not worth anything because the vords tested were not those originally 
presented. Of course, retaining the meaning of the sentence remained crucial- 
ly important. In thds paraphrase task, performance with the distractor was no 
worse than in the coiitrol condition, where phonetic processing was left free. 
Thus, it seems from this result that reading for meaning does not depend on a 
verbatim short-term memory system, etherise articulatory distracti^pn would 
have harmed memory for meaning. Therefore, we might conclude, if a short-term 
retention system is important in reading, it is not a phonetic short-term 
retention system. 

I 

. Hitch and Baddeley (reported in Baddeley^ 1979) have reported a similar 
outcome: They gave subjects sentences expressing simple propositions that 
vers either true or false (ffiES HAVE WINGS ) and had subjects either carrying a 
simultaneous digit-memory load or performing a concurrent articulatory- 
distractor task. The finding was that keeping the articulatory (phonetic) 
^stem occupied with the distractor task had no effect on true/false reaction 
time. However, the digit load did interfere. Again, comprehension seemed not 
to depend on an intact speech system, as the hypothesis of Hann and of many of 
the irest of us^^fthild predict. 

Carpenter and Dahneman (19B1) have offered a different kind of evidence 
that suggests comprehension does not ordinarily wait lon^ enough for a, process 
of phonetic ' analysis and short-term storage. In their 'garden path" materi-^ 
als, subjects read sentences with words such as BASS in the context of text 
about fishing. The word immediately following BASS was, however, GUITARIST, 
which underpines the first interpretation that would have been applied to BASS 
(that it rhymed with PASS). The measurement of interest was in visual 
fixation times, word by word. As would be expected, nothing special happened 
up to and Including the word BASS. However, fixation times were reliably 
longer on the word GUITARIST in the garden path sentemces than in appropriate 
controls. Ttiis means timt during the 'time of a normal fixation, typically a 
quarter second, analysis of that word had gone on to a level that responded to 
semantic anomaly. 

^Frasier and Rayner ^^1982) have shown much the same thing with syntactic 
anomaly. Ibeir subjscta read sentences such as VHILE SHE WAS SEWING THE 
SLEEVE PELL IHTO HER lAP. Here, it is the word FELL that receives the longer- 
than- normal fixation. The fact that people extend their normal fixation 
period of around 250 milliseconds, on this word, means they must have detected 
its anomalous role in the parsing solution that they had been constructing up 
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to that i)Oint. If th© eyntac tW semantic analyais that supports this had been 
awiting the formation of a phonetic string in short-term maoabry, it would be 
a alomr process. I certainly had , not previously dreamed that parsing and 
analysis of meaning occurred while the pe.rson is still looking ett the word in 
question. Certainly, a phonetic short-term memory representation of a word 
would b© hard to set up within the first 250 milliseconds that the subject 
laJLd eyes on it. If the trailing phonetic process referred to by Mann and the 
H&skins group were comparable to the eye-voice span of oral reading, we should 
have expected the "cognitive alarm" to have sounded only some three or four 
words, after the eyes first rested^ on ths troublesome word FELL. 

Thus, we have two discouraging results for the Haskins iargument that a 
phonetically based short* term retention sjrstem is a necessary supporting 
process for reading sentenced. ^Irst , we find that comprehension of the 
meaning for sentences is unimpaired by eliminating the phonetic system through 
articulatory distraction. Second ^ we find that "high-level comprehension 
processes can occur within the quarts r-i second or so- that the eyes are still 
fixated on a word, too fast fpr a trailing phonetic process. 

Hy second reflection on the Mann (in press)' paper concerns the direction 
of "^effect that connects a deficit in phonetic processing and a deficit in 
reading. Korais, Gary, Alegria, and Bertelson ( 1 979) demonstrated that 
learning to read, ±n illiterate Portugese adults, has the consequence of 
dramatically improving performance in a phonetic segmentation task similar to 
those used with children by the Haskins group. The linguistic maturity that 
goes with reading thus seems to depend not ^nly on age hut on specific 
training in only the reading skill itself. I think this is different from the 
conclusion Ifann (in press) wishes to reach in the concluding section of her 
paper, about how linguistic skill may presage reading success. The argument 
that the former presages the latter comes from the circumstances that the two 
skills wsre measured in kindergarten and a year later, in first gradp, 
respectively. « 

Tto make a causal argument, however, more is required t The time- lagged 
correlation technique, for example, measures the predictor and criterion bpthi 
at each of two times. The tolling outcome is when the predictor at Tim© 1 
correlates better with the criterion at Time 2 than the criterion at Time 1 
with the predictor at Time 2. (This would be true if smoking at age 20 
correlated with lung cancer at age 50 more highly than cancer at age 20 
correlated with smoking at age 50*) Perfetti et al. (Note 2) have begun to 
take this logic seriously in their investigations, (it is interesting that 
people shy away from the word "cause" in this field; Hann and her associates 
talk of "presaging" and Parfetti et al . talk of "enabling.") 

The danger is of course that the kindergartners who did well in Itenn' s 
segmentation task are those i^o had already learned to read, and they 
performed well in segmentation precisely because they^ad learned . to road. 
The linguistic awareness that allows segmentation would' then be a consequence 
of reading acquisition and not a precondition for it. At a different level , 
with second^ language learning, I can testify that. it was only when hit with 
Latin that I began to gain aimreness of grammar in my own language. Thus, it 
may bo a. general rule that "linguistic awareness" is a consequence of formal 
instruction leather than a precondition for it. Would learning to read result 
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in children's relying more on a phonetic short-term memory code than before? 
The objection I am m«ad.ng is less attractive in this instance but the need for 
something like causal analysis is no less pressing. 

In conclusion, I want to be clear that the value of Mann's contributionr-^ 
.and that of her colleagues at Haskins Laboratories, is not weakened by^our 
ignorance of lAiich way the causality goes. It is important that reading seems 
specifically to track phonetic skills in children, even when IQ ia removed. 
That association has received impressive documentation by the Haskins group. 
By comparison, there is only a loose set of suggestions that other cognitive 
factors play a central role. Anyone wishing to advance one of these 
suggestions seriously faces an enormous task. The remaining two papers in 
this group have tried to establish Just that, and so it is with narrowed eyes 
that I turn to them. 

The Morrison Paper 

According to Morrison (in press), the controlling deficit with disadvan- 
taged readers is their difficulty with irregular rule systems, such as 
grapheme- to- phoneme correspondences in English. One question that needs to be 
raised, in connection with the Morrison paper, is to what extent the problem 
lies in one particular irregular rule system — spelling- to- sound correspon- 
dences in Biglish-^as opposed to a geneVal deficit with all ilrregular rule 
systems. If it is "knowledge about words and how they are pronounced" 
(Morrison, in press) that is to blame, then the question becomes how this 
hypothesis is any different from that of the Haskins group or from one of the 
"processing deficit" hypotheses that Morrison wishes to reject. It sounds to 
me as if the failure to translate letters into their corresponding sounds is 
none other than a failure to achieve phonetic coding. . 

It is not clear, either, whether the irregularity of Baglish spelling 
rules, by itself, even contributes to the difficulty that some American 
schoolchildren have learning to read: If the irregularity were to blame, then 
in languages such as Spanish, there should be little or no difficulty; the 
same would be true of different writing systems, such as Japan's, which do not 
u^e the alphabetic principle. However, recent evidence indicates such 
language communities do indeed see reading disability among ^eir children 
(Stevenson, Stigler, Lacker, Lee, Hsu, 4 KitaifuraT in press) previous claims 
to the contrary notwithstanding. (l thank Robert Sternberg for bringing this 
article to my attention.) 

If, on the other hand, Morrison wants to suggest that disabled readers 
are poor at mastering any irregular rule system, then another two questions 
emerge: * 

The first of these is whether it is only because irregular rule systems 
are more difficult than regular rule systems that poor readers seem to have 
particular trouble with them. The sad fact is that easy tasks seldom produce 
large differences between normals and disabled populations, whereas difficult 
tasks do. This is true idiether one is looking at normal and disabled readers, 
normal and amnesic adults, or at young and elderly populations. I have 
spelled out this problCTi in some detail in Crowder (1980) for the case of 
aging and memory capacity. What it means is that we should be particularly 
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euapicious idien group differences emerge only, or especially, in the most 
difficult of the tasks or conditions under study. Morrison (in press) 
acknowledges that there may be ceiling effecto in the data of his Figures 1, 
2, and 3, in irtiich the difficulty of individual letter- sound rules is shorn 
separately for normal and disabled readers. That admirable candor still does 
not alert us^ to how insidious the problem is. For example, among the 
individual conditions shoim in those figures, I calculated the Pearson 
correlation betveen the dif ferenqff)^ between normal and disabled readers and the 
overall perjformance level of the normal readers.. This correlation was -.60. - 
Furthermore, a look at the figures shows that even within letter classes, this 
correlation was substantial. • . 

The second question raised by the assertion that there is a general 
problem with irregular rule systems, in disabled readers, is what confirming 
evidence there is from outside the realm of reading. In the absence of hard 
evidence that disabled readers are systematically poor with irregular rule 
systems of any kind, the Morrison (in press) hypothesis would have to be taken 
on faith. The fact is^ 'there are several pieces of evidence that rule 
regularity is not a relevant dimension to reading disability: (l) Mann (in 
press), in her Figure 1, has shown that the failure of poor readers to use a 
jjhonetic code in short term memory eztjends to spoken sequences, as well as 
written. It cannot be claimed that spelling- to- sound rules are to blame ¥hen 
there is ncUiing written in the experimental procedure. (2) There is the 
Brady, Shankweiler, and Mann (in press) experiment, showing that poor readers 
are at a disadvantage in perceiving phonetic segments through speech (but not 
naturalistic sounds). Again, when there is no writing, we cannot talk of a 
spelling- to- sound conversion problem. Finally, (3) there is evidence that 
poor readers are in trouble with rule systems that are completely regular. 
Supramaniam and Audley (Note 3) have examined reading in seventh- through- liinth 
graders in* relation to the Test of Primary Mental Abilities. They found a 
correlation of .72 between the numerical-arithmetic subscale of this test and 
word recognition, the highest association in their data. This last ^result 
supports the claims of Morrison and of Wolford that reading disability is more 
general than just a reading problem. But it extends this claim in just the 
wrong direction for Morrison's hypothesis, arithmetic being perhaps the most 
well-behaved rule system we have! 

The Wolford and Fowler Paper 

Wolford and Fowler Cin press) have presented an important new observation 
about the difference between good and poor readers: The latter are systemati- 
cally unable or disinclined to make use of partial information to select a 
correct alternative. They noted that the apparently greater use of phonetic 
information by good readers than by poor readers is inferred, by the Haskins 
group, from the relative prevalence of errors that preserve one phonetic 
aspect of the correct item. In a spirit of magnificent skepticism, they 
observed that poor readers' failure to use partial information is an alterna- 
tive explanation for the same data. ' 

The question was then why the good readers don't also use partial visual 
information and, in so doing, commit errors of visual confusion. Wolford and 
Fowler responded correctly that nobody makes visual confusions in the short- 
term verbal memory task that Conrad ( 1 967 ) and others have used, neither young 
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subjects nor addlts. To offer a fair opportunity for good readers to make use 
of partial visual Information » thent ve need a task irtiere It Is plausible to 
expect visual factors to be more important than they are in short-term memory. 
Such a task is the so-called lAiole- report procedure. In this task, the same 
nmber of letters is presented for report (four) as In the short-term m^aory 
task. Honever, they are presented simultaneously for only 117 milliseconds, 
rather than successively at a rate of 600 milliseconds apiece. Furthermore, 
recall of the letters is immediate in the nhole report task, not delayed by 
nunerical distraction as in the short-term memory task. It is likely before 
the fact, therefore, that the limiting factor should be maaory in the short- 
tera maaory task and Wlsual acuity in the lAiole- report task. Sure enough, 
adults make primarily visual errors in the latter, not phonetic errors 
(Volford ft.9o¥ler, in press). 

Bie striking new result turned in by Wblford andi Powler (in press) is 
that on the iihole report task, good readers make a^nflclantly more visual 
confusions than the poor readers, vho do not differ from chance. Biere were 
not any appreciable phonetic confusions for either group in irtiole report. 
With the same subjects, and comparable stimulus materials, the Haskins-Conrad 
result was replicated for short-term manory; there, the confusions were all 
phonetic and good readers made more of them than poor readers. !Die force of 
this pattern of results is to produce an enormous leap in the generality of 
the confusion- error result: As Volford and Fowler say, the more general, and 
therefore preferable, conclusion is that the good readers are better able than 
the poor readers to deal with stimuli analytically, and to use partial 
information to select a response choice. Ibis conclusion is greatly enhanced 
l)y the two other experiments Wblford anii Fowler (in press) report. I shall 
not describe them here, but both generalise the partial- Information hypothesis 
in tasks that are satisfactorily different from the letter- string tasks 
described above (and from each other). 

Although they practice the artifact- prone matching technique of dealing 
with IQ (Number 1 in the list given earlier in the pa per)^ Wolfe rd and PoTller 
place themselves among those vho consider the skills in^ reading — especially, 
using partial information — Inherent in the very definition of intelligence, 
n^e problaa wotild then become to set out the individual skills measured in IQ 
tests and see idilch of them load most heavily on the ^lartial- Information 
factor. It may well be that Wblford and Fowler themselves have stated the 
crucial process a bit too narrowly and that, as they suggest in the closing 
sentences of their paper, the really pivotal skill is the capacity for 
analysis; without analytical capacity, using partial Inforaatlon and a gceat 
many more things are difficult. Tixe experiments Wolford and Fowler offer are 
not really capable of distinguishing the capacity for analysis of parts within 
a idiole from using those parts for response selection. It is to be hoped that 
yet more converging investigations can distinguish these possibilities. 

So, perhaps disabled readers are less Intelligent than normal readers 
with respect to analytic skills. I expect this hypothesis will be a valuable 
one with regard to "garden variety" ^ poor readers.^ I reserve the right to 
suggest that there may be a special class of disabled readers, sometimes 
called dyslexics , for idiich this analysis is insufficient (see Cronder, 1982, 
Chapter 11 ). ^ese are the individuals idiose auding is perfectly normal and 
grossly discrepant from their reading, those irtib form a bump at the low end of 
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the ipeading-skill distribution, those with a familial history of reading 
problMs, and those for vhom the ratio of boys- to-girls affected approaches 
4/1* I take it this sjaposiua does not aim specifically at this very special 
population and so I shall not continue in this vein; however, I personally 
would rather we reserved terms like reading disability** for these children 
and adults* i 

Lessons to Take Home ^ 

Ibere is no shortage of theories and hypotheses in the area of reading 
disability. What we need more of are facts that fit together. The hypotheses 
will surely come and go, even as they have in the most advanced sciences, but 
the facts, if generated by clean experimental or quasi-experimental logic, 
will endure. Qa these terns, I believe we can carry away two very solid new 
pieces of factual information from this set of papers. 

t. Good readers make visual confusions more than poor readers in a whole 
report task. I have just finished reviewing this finding of Wolford and 
fbwler (in press) and so I won'^t harp on it more now. I think it puts in a 
more general light the "special^ relationship" between phone tia^grocessing and 
reading established by the HaskLns group. 

2. Brady, Shankweiler, and Mann (in press) have shown that good and poor 
readers differ in phonetic perception under noisy stimulus conditions but not 
in identification of natural i&Ltic sounds. .....^——^ 

!Diese two new facts may be rationalised together by the Assumption that 
idien noise is added to speech it results in fragmenl^ed stimuli, sim.il ar to 
those postulated by Wolford and Powler to be especially hard for poor readers 
to use* 

In answer to Morris^^s (in press) challenge then—why reading? — the 
weight of the new evidence points in the direction of a general answer. It is 
not just reading that suffers in poor readers; they are subject to deficits 
elsevhere in cognitive functioning. We have seen Hhe poor readers at a 
disadvantage listening to speech, remembering "meaningless" Chinese char- 
actelrs, and, in the work of Supramaniam and Audley (Note 3)t performing poorly 
in nmerical-arithmetic skills. !Ihus, if Morrison meant. "Why reading^ and not 
other skills as well?" — we can answer that the other skills are, after all, 
affected. Fbr future investigators, a. big priority for the agenda is then to 
see nhibh "other skills" are the ones that go with reading. On this matter, 
the present papers have formed a promising beginning. 

'L * ' ' 
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FOOTNOTE 

1A reviewer has pointed out, quite correctly, that we should not be glib 
in assuming that "strictly auditory taski^" would not be affected by literacy. 
Knowing the orthography may well influence lexical representation and organi- 
zation. For example, Seidenberg and Tanenhaus (1979) demonstrated orthograph- 
ic effects in rhyme monitoring with only auditory stimuli. On the other hand. 
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not all speech perception tasks vould likely „ be subject to yprthographlc 
influsnces. Highly analytic tasks, like the rhyme monitoring of Seidenberg 
and TbnsnhauSy would be expected to show such effects idiile direct speech 
perception would likely not. With the nonsense syllables used in the Brady et 
al. study (in press), there 1b no orthographic representation waiting in the 
lexicon t of course. 
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J. k. Scott Kelao^ 



••Aa long as man has existed, he has puBsled over the 'agencies' by which 
animal action was affected." So said Franklin Fearing (1950, p. 1) in a 
remarkable little book on the history of reflex action and its relationship to 
the developnent of physiologic at* psychology. However, although some notable 
>^hologist8 have contributed to an understanding of the processes underlying 
the organisation of movements, it is probably fair to say that in the last 
thirty years or so, psychology in general has expressed only a dabbling 
interest. There are signs, and this book is one of them, that the times are 
changing. Part of the Impetus comes from neuroscience, which has told us for 
a long time that a healthy portion of the brain contributes to the generation 
and regulation of movements (e.g., Bvarts, 1979)' Iff as the popular press is 
wont to inform us, the brain constitutes "^the last frontier," the study of 
motor control becomes even more interesting than otie might first have thought . 
Still another push for a more serious consideration of action processes comes 
from the newly developing area of cognitive science. Donald Norman, for 
exnple, in his paper on "Twelve issues for cognitive science" (Nbman, 1980) 
identifies "the problem of output, of performance... [as] too long neglected, 
now Just starting to receive its due attention" (p. 23) i and the issue of 
skill as not Just "...a combination of learning and performance. More thai> 
that, perhaps a fundamental aspect of cognition" (p. 24) • 

Of course, none of this is particularly new to a small, and persevering 
group of people in physical education and kinesiology who have been plugging 
away in the laboratory for some years now, experimenting and speculating on 
idiat goes on idien people acquire skill and control movements. The fact is 
that for even the simplest of movements, no one really knows. The author of 
this book, Dick Schmidt, is a leader in the kinesiology field. Among other 
achievements, he has contributed two interesting and provocative papers to 
Psychological Review (Schmidt, 1975; Schmidt,. Zelamik, Hawkins, Prank, A 
Quinn, 1?79) that combine theory and data about the learning and control of 
Simple movements. # 



•Review of Motor control and learning ; A behavioral emphasis , by Richard 
A. Sclmidt (Champaign, 111.: Human Kinetils. 1962). Contemporary Psychology , 
in press. 

♦Also Itoiversity of Connecticut. 
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Hare Sctanldt turns his hand to pgroducing an undergraduate textbook whose 
cover clalBS it to be ^ •••the most comptrehensive book on motor behavior to 
date.** ¥ith some reoervationa, but with no little sense of awe, I have to 
agree^ ^evloua textbooks , in the opinion of many, have possessed a sort of 
superaarkBt quality— plenty of isolated facts collected from all sorts of 
diverse settings but little or Ho structure to hold them together^ In short, 
as soaeone said in a rather different context, they turned out not to be worth 
your green stamps^ IDiis book is a welcome change and, as a textbook geared to 
undergraduates ^'•••with little or no background in experimental psychology or 
the n euro sciences" (p^ xix) , it represents a first-class efforts 

The emphasis of the book, as the title indicates, is largely behavioral. 
Its major aims are "•••to understand the variables that determine motor 
perfo nuance proficiency, and to understand the variables that are most 
Important for the learning of movement behaviors" (p^ -3). Yet the book also 
promises an integration of the behavioral literature with the fields of 
biomechanics and neural controls Though this is welcome, it probably overex*- 
tends the author a little, as indeed it might anyone^ Biomechanics and neural 
control are rapidly expanding fields idiose tools and techniques are constantly 
changing^ E&ch discipline could contribute not one, but many books to the 
area of motor controls It is unlikely that investigators and teachers in 
either field will get too excited about the integration presented ^ere^ Bach, 
I suspect, might feel a bit shortchanged • In Chapter 3, for exampde, there is 
a brief, though useful discussion of kinematics^ But this just about covers 
Schmidt's treatment of biomechanics and is probably not enough to keep the 
biomechanics people happy^ As for neural control, much of the author's 
treatment deals with work on locomotion and so-called "spinal generators" (in 
relation to open- loop, motor programming processes discussed in Chapter 7), 
although there is also a^ fairly brief presentation of the role of sensory 
receptors that might contribute to motq^r control (in CSiapter 6, which 
emphasises closed- loop processes) • I doubt if this is enough for the student 
Interested in integrating motor behavior with associated neural control 
processes, although it provides a good hint of the possibilities • 

For me, the guts of the book are in Section 2, which contains^ eight 
chapters under the heading Ibtor Behavior and Controls These are bounded by 
rather conventional but necessary chapters (at least if a semester couz^se is 
envisaged) dealing with the history of the area and scientific methods 
(Section I) and mentor learning and manory (Section 3)« %e latter section is 
a bit disappointing; there is no recognition of the important biological 
constraints perspective on learning (see Garcia, 1981; and Jilmston, 1981, for 
recent review), and ethological approaches are completely ignored • As Salts- 
man and myself have recently pointed out (Saltman A Kelso, in press), the 
area of motor memory and learning continues to deal with "items" as relevant 
stimuli (cf^ Scfaiidt, Chapter 4 and p^ 606), a term that is completely neutral 
to the kinds of functions people and animals perfom^ Treating motor memory 
as a collection of items linked to traces "in" memory is a vestige of old 
verbal learning theory and associationism^ It tacitly assumes what Seligman 
(1970) called "equivalence of assoc lability," that it is equally possible to 
learn any relationship between stimulus and response; it fails to recognize 
important evidence that animals do not operate in universal contexts, that 
they are not general- purpose machines (e^g^, Bolles, 1972)^ In contrast to 
Sctaiidt's critique of task-oriented approaches (p. 82f f ) , maybe it is time to 
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give nore thought to the types of tasks organlams (including humans) perfonn, 
in recognition of the fact that those tasks that meet existing constraints are 
easier to perfozm than oxhers that do not. Perhaps, as Greene (1971) and 
others have long argued, ve need a theory of tasks that takes as its goal a 
clarification of the intrinsic relationship betveen a particular environmental 
structure and * the animalt rather than focus, as Schmidt does, on the 
characteristics of animals themselves (e.g., the' heavy emphasis on the 
composition and structure of so-called motor programs, a topic that I'll 
return to) • 

Belatedly^ the psychologist reading; this book may be sui'prised to find 
very little on the action system as a coherent perceptual-motor, or, for that 
matter , motor- perceptual uait. In fac t , this book deals with perception 
hardly at all. To the extent that it does, it does so in a way that many 
might find unsatisfactory. Fbr example, some reference is made to the 
Important role of optical flow fields in the visual control of movement (e.g., 
p. 96). Hovever, these are treated as no more than inputs to stimulus 
identification in a conventional stage model of information processing. Of 
course, the latter involves the assumption that the 'system constructs its 
vaurious memory representations oik the basis of its inputs, while the theoreti- 
cal import of the optical flow work is that the infomation for action is 
readily available to a suitably attuned performer. !Dius , in this viewpoint 
(Gibson, 1966, 1979), skill does Inot require the construction or accunulation 
of cognitively based representations; rather, the information being picked up 
becomes more and more precise as skill develops • Putting Gib^n in with 
information processing approaches misleads, more than informs. This aside, 
the main point is that a book with a largely behavioral emphasis might have 
elaborated more fully the importance of perception for the planning and 
control of action. Arbib (i960, 1961 ) has made some nice contributions in 
this regard, which are conspicuous by their absence in Schmidt's book. 

Also, ^hmidt could be criticized (and this may be nit-picking on my 
part) for pOTpetuating a distinction between "sensory" and "motor," which in 
the minds^flT many no longer holds water. Yet it crops up in a number of 
places th^ughout the text. In his discussion of motor bhort-term memory 
(itself possibly a misnomer), for example, Schmidt harbors tne suspicion that 
the memory waan' t about motor things at all, but "...rather was concerned with 
the retention of sensory information about the feedback associated with the 
target position" (p. 623). And, in his earlier mention of Rikuda' s observa- 
tions that many skilled athletes exhibit fundamental movement patterns that 
resemble reflexes, the author suggests that it is not because the tonic neck 
reflex is being recruited when the baseball player jutnps to catch the fly 
ball, but rather because the player is "merely, looking at the bedl" (p. 224)« 
But in both these examples and elseidiere in the book, the author can be 
faulted for trying to draw too simple a contrast beti^en sensory and motor 
evaxts. In the days of Bell and Hagendie this may have been permissible; in 
1982 (and Indeed much earlier), the data no longer allow it. Interactions 
between so-called afferent and efferent pathways occur at all levels of the 
neuraxis (of. Miles * E^arts, 1979; Roland, 1978; Smith, 1978). Central 
signals modulate , and are modulated by, the activities at the periphery; 
consequently, attributing undue Importance to afference as closed-loop theo- 
ries do, and efference, as in motor program^ theorizing (cf. Schmidt, Chapters 
7 and 8) is at best misguided. Students bf motor behavior are ill-served when 
the distinction is overly emphaSieed. 
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In reading through the book, I vaa both pleased and aurprleed that the 
author included aome laauee that have not prevloualy been central aapecte of 
hie vork.f Aaong these are a nice dlscuflslon of tuning (in Chapters 6 and 8) 
and the ao-called degrees of freedcra problem identified by Bernstein (whor^ 
the May, vaa writing aa early aa the 1920s (e.g., Bernstein, 1926), not ae 
Schaidt says, the 1940e) • For aoae af ua, a rationalisation of 4ioW the many 
potentially free vaFlables become regulated in the course of coordinated 
movement remains at the core of a viable theory of action systems. Schmidt 
* quite rightly points out that the degrees of freedom problan ie " . . .one 
difficulty for the closed-* loop model, and for any other model that holde that 
the contractions of the yarious muscles are hcmdled by direct ccmmande from 
higher centers" (p. 245) • Hovever, the "other model" in thie caee happene to 
be very close to the .author's favorite topic, motor programe, which, in spite 
of dome provisos that have been introduced for the involvement of feedback 
during movement execution, still remains in the modified definition ae "...a 
H^entral structure capable of defining a movement pattern" (p. 299) t and still 
retains "...the eeeential feature of the open- loop concept" (p. 299) t that ie, 
\ direct command specification to muscles. 

Ihus, Schmidt argues that Vadman, Denier van der Gton, Geuse, and Hol'e 
(1979) work on the trij^sic electromyographic pattern between agoniete and 
antagoniste during rapid elbow flexion can be explained by motor programming: 
"It ie as tf*the individual said, 'Do the arm movement,' and a motor program 
was called up that handled all the details, producing the EMG pattern found* 
In this way the nuiber of degreee of freedom involved in the 1 jjnb action, from 
the point of view of the etages of inforaation proceeeing, ie reduced to one" 
(p« 247). Of course, it is precisely thie type of account that Bernetein 
warned us against — that ieT, when aeked the queetion: "How are the degreee of 
freedom of the motor apparatus regulated?", one reeponde that the detaile are 
taken care of by a motor program. Thie is a fait accompli , but not an 
explanation. ' 

Elseidiere, my colleagues and I have argued that the etrategy of aseignin^ 
orderly and regular behavior to a conetruct euch ae a program or reference 
level that embodiee eaid order and regularity ie fraught with probleme. Here 
is not the place to elaborate theee (but eee Kugler, Kelso, 4 Turvey, 1980; 
Kelso, 1981; Kelso, Holt, Kugler, 4 Turvey, 1980) except to emphasize that an 
alternative strategy ie available. Such a strategy eeeks to explicate the 
neceesary and sufficient conditione foj: orderly behavior to arise, and to 
understand the dissipation of the body's many degrees of freedom ae an a 
posteriori fact of ite dynamical organization, not ae an a priori preecription 
for the system. ^ ^ 

For example, it ie very tempting, on the baeie of elegant kinematic 
evidence by Shapiro, Zernicke, Gregor, and Dieetal (1981 ) regarding the 
proportions of time spent in the various leasee of htxnan locomotion, to 
assume, as Schmidt doee, that "...a given gait ie controlled by a given 
program" (p. 315, see Schmidt, Figures 8-12). But this account ranks in 
Eudyard Kipling's "Just so" category. Because one observee a different phasic 
pattern for walking and Jogging, there is no reason to conclude that walking 
and Jogging are controlled by different progreflas. 
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Indeed , if recent work on * horse loooraotion la an Indicator, a very 
different account is noaaible, and one that for this reviewer, at least, is 
slightly Bore reveall^. Obus, HDyt and Taylor (19B1) have found, ueing 
■letabolic aeaauree of^Szygen conaunption, that the ralnimua energy cost per 
unit distance la aluoat the aame for a hdrae whether it walks, trots, or 
gallops. These three stable locomdtory modes, therefore, correepond to 
regions of miniJiiB energy dissipation. lAke nany other examples of phase 
transitions in natur^^, these modes can be "broken** when the system becomes 
unstable. Ihus, it becomes eztreaely expensive energetically for a quadruped 
to majjdtaln a walking mode at increased speeds. A sudden and discontinuous 
transition occurs at a critical velocity value and the animal switches into 
the next stable, and less ener^getically expensive mode. !Diis ie not a hard- 
wired and deterministic phenomenon: horses can trot at epeede at which they 
nonnally gallop, but as anyone lAio has watched pacers on a race track knows. 
It takes a lot of training and is metabolically costly, ^y point is not that 
we know a lot about gaits and gait traiisitions (we don* t); it is that there is 
promise here in an account that drai(f on theories of nonlinear dynamice and 
nonequilibriiiB phenomena in general. Common features of such phenomena (and 
there are some remarkable similarities across many different natural eyente, 
cf. Haken, 197?) are that when a stable system ie driven beyond a certain 
critical value, bifurcations may occur and qualitatively new forme ariee. 
loiportantly, for SctaQidt*s interpretation, no ** program** or ** central repreeen- 
tation** of the upcoming behavior exists prior to the occurrence of the new 
space- time organisation. 

In conclusion, many of my ranarks have reedly epoken to the second main 
claim on tl\e cover of this book, that **...New hypotheees are 
advanced .. .resul ting in new insights and, in some cases^, conclusions that 
differ from prevailing views.** Hy remarks atteet to the highly volatile and 
stimulating nature of a field that is presently undergoing contin^iovy^ change. 
The problems of action, as I remarked at the beginning, are deejH^nes that 
have pussled scientiets and philosophers for a long time. A textbook in this 
area is not like G ray* s Anatomy ; it reflects only one person's view' of the 
state' of the art. To the extent that a textbook is a desirable thing in the 
motor behavior area (I believe it is, but many I suspect, might find it 
premature), this one by Schmidt presents the issues as he seee them in a 
coherent and well-organiEed way. I recommend the book highly to those 
peychologists who want to find out more about motor control. But in the same 
breath, I would warn^ttiem that what they eee befo^ them today may be griet 
for the mill tomorrow, ^that* s as this reviewer, and I suspect the author, 
would want it to be* 
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A tsqplcal middle-claes teerlcan child of six yoars can recognise nearly 
8,000 root wrda (according to ifLldred Templin, 1957)* Ibe child has learned 
these words over roughly four yeara, at an average rate of five to six a day. 
Buch word is fomed, according to a set of language- specific rules for 
oonatruoting syllables, by combining a fev of the several doeen articulatory 
patterns that generate the consonants and vovels of an Biglieh dialect. Ifov,^ 
ve may veil ask, does the child learn the perceptual and motor patterns that 
tfill permit it to build so large a lexicon in so short a time? 

That is the question to lAiich these tvo volumes are addressed. !Ihey 
comprise the proceedings of a conference of 34 linguists and psychologists, 
convoned by the national Institute of Child Health and Human Developnent in 
Betheada, Maryland, during May, 1978. They fozm a compendiun of theory and 
research done over the previous decade in the young field of child phonology. 
According to a rough count by Jenkins (given In a chapter of shrevd corameats, 
critic iM, and advice at the end of Volws 2), over 90% of the referencea in 
these volmes are* to works published since 1968, and over 60% to works 
publiahed since 1973* ^ - . 

Child phonology begins (as Fbrguson and Yeni^Kom^ian [Vol. 1, chap. l] 
remind us in their useful introductory survey of ita history) with the 
publication of Jakobson* a Kindereprache, Aphasia und allgemeine Lautgeaetae in 
1941. Jakobaon' 8 propoaals quickly becam^e standard dogma because they offered 
an elegant integration of phonological development into \ the then^dominant 
structuralist account of phonology. Central to Jakobson' s position was the 
view that babbling during the child's first year was mere random articulatory 
•xerciae and that learning to speak ims a linguistic matter, abrupt in onset 
and entailing the development of particular phonemic oppositions before other 
particular oppoaitions in a fixed, universal onier. 

HOWev^, the discontinuity between babbling and speech is more apparent 
than real, the consequence, Lieberman (Vol. 1^ chap. 7) suggests, of the 
phonetician's lack of a descriptive framework for pre-^apeech. KacNeilage 
(Vol. 1, chap. 2) points out that this lack is now being rectified. Ho 
concludes a succinct account of idiat we kaow and dp not kaow about adult 
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control of iptach production with ttim enticing suggestion that studies of pre- 
speech nay be uenable to treatment In terms of the coordinative structures of 
action theory. A coordinative structure , or synergism , is a set of muscles 
constrained to act aa a unit. Ibr eocample. Stark (Vol. 1, ,chap. 5) provides a 
framework for classifying vocal behavior during the first 1^ weeks of life and 
flnda that many features of adult speech are present but uncoordinated. Thus, 
variations in pitch and vocalic structure are observed' during Infant cry, 
whereas consonantal sounds such as clicks, friction noises and trills occur 
during vegetative processes. Stark proposes that the devrelopnent of speech 
involves the harnessing and coordinating of these features into the precisely 
timed patterns of babble. 

Stark's approach meshes neatly with that of Oiler (Vol. 1, chap. 6), who 
reports validating studies of a fraaework for describing the developnent of 
phonetic control during the first year of life, from what he terms the quasi- 
resonant nuclei of nonreflerive vocalizations in the first month to the 
variegated babbling of the eleventh and twelfth months. His system promises 
to break a bottleneck in the study of pre*apeech vocalisation, taking the 
first step toward noras that may permit early diagnosis of deaftiess or other 
pathologies. However, Oiler's chief concern 13^ with the theoretical issue of 
explaining the regularities of infant development. Do they simply reflect 
general anatomical and physiological maturation? Is there evidence of 
conscious, speech- related vocal activity during the first year of life? When 
do the first signs of shaping by the language community appear? 

Bie last question is also raised by Ideberman (Vol. 1., chap. 7) in a 
preliminary report on a longitudinal acoustic study of the speech of a amall 
group of noraal, middle-class children from birth through pre-school. 
ftirticularly valuable here, both for n^raative purposes and as evidence of 
changes in phonetic scope of the vocal Ar6ct, are a doze» foment frequency 
plots ^on idiic.h one can observe the steffihtty increasing extent of each child's 
vowel quadrilateral. Interestingly, the childfreii^ do not mimic adult foment 
frequencies, even though for many vowels they could do so, by appropriate 
vocal tract maneuvers. Instead, already by the fourth month, vowels are 
falling into their "proper" acoustic relations, a fact consistent with the' 
hypothesis of an innate nomalisation mechanian. The data' also discount 
Jafcobson's claim of discontinuity by illustrating the smooth emergence of the 
vovels of words from the vowels of babble. 

More on Jakobson 

Last it seem that I am flogging Jakobaon'*s horse past death, let me note 
that his theories are cited Cand disputed^ in 10 of the 13 chapters in Volume 
1. Indeed, Menn, in a lucid and thought- provoking chapter (Vol. 1, chap. 3) 
on the historical developent of phonological theory (with the witty epigraph 
"Beware Procrustes bearing Occam* e^ raaor") , suggests that "the entire cautious 
and meticulous modern tradition of child phonology field-work was forged 
by. ..[the] necessity" of establishing counter- evidence to Jakobson' s arguments 
(p. 28); 

This is, in fact, precisely the focus of Mac ken' s chapter on the 
acquisition of .syllable- initial stop systems (Vol. 1, chap. 8). There are two 
possible teats of Jakobson' s claim of a fixed, universal order of developnent — 
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acxosa cMldrea vltMn a langtiage and acroaa languages. Hsicken does both. 
Sto taata Jakob son' s prediction of an invariant sequence for stops Uv/ before 
/p:t/ before /p:k/) on case studies (by others) of fiVe fixglish- learning 
children and concludes that the best she can do is to reformulate the 
prediction as -frgnt before back^ and then assign it no more than a high 
probability of being right. Tasting Jakobson' s prediction that the first 
stops will be voiceless unaspirated, on her oim Etaglish and Spanish data, she 
finds strong support, but also evidence of language- specific patterns in the 
timing, ordering, and phonetic structure of the first stop contrasts, /bdg/ , 
that seem to reflect relative frequencies of these stops in the langiiage being 
learned. 

* The role of language- specific frequenciestija, of course, very much to the 
point and still far from clear. Locke (Vol. 1, chap. 10), reporting a novel 
and ingenious study on the prediction of child speech errors, presents 
arguments and evidence that there is no such effect, nonetheless, there does 
toem to be much more cross-y language and within- language variation, than 
JbkDbson nould predicts Thus, in a careful study of the production of word- 
initial ftiglish fricatives and affricates by 73 children between two and six 
years, Ingram and his colleagues (Vol. 1, chap. 9) found much the same order 
that previous studies have reported , ^ but with considerable variation flfom 
child to child, from word to nbrd, and even from time to time within a word* 
• 

Contextual variability has, incidentally, no less clin;lcal than theoreti- 
cal interest. Menyuk (Vol. 1, chap. 11) reports studies of both perception 
and production, demonstrating that children with suspected central nervous 
system » abnormalities may present quite different patterns o3r error according 
to whether they are assessed with nonsense syllables or familiar words, in a 
test situation or idiile playing with other children. Taken with the numerous 
studies reported in these volunes in which normal children display their 
diversity, Menyuk* d report should encourage caution in the assessment of a 
child* s phonological capacity. 

Continuity and Discrimination Abilities 

In Voluae 2, Perception , we again confront the continuity issue--though 
not explicitly foraulated, B||rhaps because Jakobson himself did not consider 
the infant' s psFceptual capacities. However, Blxmstein, once Jakobson' s 
student, fills the gap in a chapter (Vol. 2, chap. 2) reporting her work with 
Stevenson the spectral structure of stop consonant release bursts. Crossing 
the psychology of Hume with the linguistics of Jakobson, Bltmstein posits 
"innate biological mechanisms., .selectively tuned to primary , [linguistically] 
unmarked, invariant acoustic cues" for place of articulation, in conjunction 
with "marked . . . secondary context-dependent cues" idiose linguistic function the 
infant learns "as a direct consequence of the cooccurrence of these cues with 
the invariant acoustic properties" (p. 19). 

The hypothesis of "innate biological mechanisms" stems, of course, from 
the many studies precipitated by Eimas and his colleagues (1971 ) when they 
successfully transposed from visual to auditory research the high amplitude 
sucking procedure for assessing an infant's discriminative capacity during the 
first three to four months of life. ELlers (Vol. 2, chap. 3> describees the 
paradigm and others suited to later age ranges— heart rate variation as an 
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indu of attention (1-6 Bonths) and visual rainforccaent (by an animated toy) 
of haad turning tonard tha locua of a atlnulua change (6-18 months). ELlers 
also reviavB many studies using these techniques to danonstrate that infanta 
can discriminate virtually every major acoustic property that underlies a 
phpnwlc contraat in Boglish. Fsw negative findings have been reported, and 
such Ihoonsistencies as there are betneen studies seem to be due to inadequate 
acoustic specification. Ibr example, researchers generally (Eilers is no 
exception) seem unavare that voice onset time (VOT) was 4)riginally defined as 
a special case of a general articulatory variable (the timing of laryngeal 
action) that would generate any and all of more than a dozen acoustic cues to 
voicing distinotions. Unwary synthesis can thus produce different responses 
to the same value of VOT due to differences in, say, release burst energy or 
the onset frequency of the first fozmant. 

In any event, what we now haVe is a- rough\taxonomy of infant peychoacoiis- 
tic capacity for discriminating (not categorieing) certain dimensions of 
speech 80und8--in all likelihood, a general mammalian capacity that tunes, 
rather than is tuned to, speech. Kubl (Vol. 2, chap. 4) has higher goals. 
Hdr current research makes direct tests, by the head turning technique, of an 
infant's capacity to foim categories of speech somds. Her data show that 6- 
month->old infants can learn to categorize: (1) Itokend of /a/ versiis /l/ and 
of /a/ versus /o/, spoken by a male, a female and a (synthesized) child on two 
different pitches; (2) tokens of syllable- initial or syllable- final /s/ versiis 
/J/, and /f/ versus /e/, spoken by several, talkers with /'l,a,u/; (3) 
(according to preliminary data on a single infant) tokens of initial, medial, 
or final /d/ versus /g/, spoken with /l,a-,u/. This research directly 
confronts crucial Issues of segmentation and invariance, across speakers and 
phonetic contexts, and is, in my view, the most interesting current work in 
the area. 

lionetheless , if 6-month-old infants are indeed able to segment syllables 
and form categories of their component conson^tal and vocalic portions, what 
are we to make of the apparent perceptual difficulties of older children? 
Barton (Vol. 2, chap. 6) provides a critical analysis of the methods iised to 
assess a child's capacity to discriminate (that is, distinguish between two 
stimuli) and identify (that is, refer a stimulus to an internal representa- 
tion, perceive phonemlcally) . Whatever the task, perfomance varies with many 
factors, such as word status ( re^ vs. nonsense), word familiarity, feature 
composition, and of course, age. In general, 2- to 3^year-old children seem 
to identify at least familiar words quite accurately. But why should 
familiarity be a factor at all? 

Of course, some sounds are more difficult than others. Barton shows that 
there is no evidence for any general order of perceptual acquisition in either 
Russian or Ehgllsh (the only la^uages on idiich there have been studies, it 
seems). But certain distinctions are notoriously difficult — for example, /f/ 
versus /o/ (on idiich Kohl's infanta were successful), or /r/ versus /l/. For 
the latter contrast. Strange and Broen (Vol. 2, chap. 7) report a careful 
study of tlUnty-one 3-yaar-olds in idilch they found evidence of a perception- 
production link: If a child had difficulty with the identification tai^k, she 
was more likely to have difficulty producing Vr/ or /I/ than if she did not. 
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Perhaps the edlutlon to the pussle lies in paying more attention to just 
hov a child* s perceptual capacity ia measured. Strange and Broen (Vol. 2t 
chap. 7) provide an excellent discussion of this matter as it bears on the 
relation betveen perception and production. They suggest that measures of the 
tvo processes should be in some sense coordinate: "It would 8ean...more 
reasonable to cQ«pare...[ the] •• .kind of< perceptual capacity [assessed in 
inflants] vLth an aapirical assesament of the physiological capacity to produce 
these sounds (i.e., with motoric capabilities independent of linguistic 
volition) ...[as in]. . .prebabbling vocalizations" (p.'149)« They point to our 
lack of **a concept of * intentional » coordinated perception' .. .comparable to 
our understanding of speech production as the articulation of lexical items 
with the intent to communicate linguistically" (p* 130). 

Liplicit in this arguoient is the assunptian that perception and prodxtc- 
tion somehow march together. Straight (Vol. 1, chap. 4) in a somewhat naughty 
and polemic chapter argues, to the contrary, for two separate and distinct 
components in auditory and articulatory processing. Much of his argunent 
stems from irtiat he himself acknowledges to be an "egregious •• .lack of 
knowledge of the literature on child and adult speech perqeption and prodxic- 
tlOn" (p. 67). But he has also been overly impressed by those well-known 
cases in which a child knows that sha is saying, for example, [fis], when she 
should be saying [flS]« This, of course ,isljrtlat we would expect if learnir^ 
to speak entailed the gradual marshaling of subtly interleaved motoric 
structures so as to capture the delicacies of dialect. 

Perception and Action 

In fact, perhaps the most st]^iking achievement of the child in learning 
to speak is. that it learns to reconstruct the language of its community with 
such precision. Ode is not surprised that mothers begin to exaggerate their 
articxilation, clarifying their i^onetic execution, ^xist idien the child begins 
* to utter its first words (Malsh^en, Vol. 2, chap. 9)» nor that a Spanish child 
learning Btigliah as a second language will display an appropriate shift of a 
few milliseconds, away from the Spanish and toward the Baglish boundary, in 
Judgments ofA^a VOT continuum (Williams, Vol. 2, chap. IO). Perception has 
evolved to control action (and action to control perception). There is no 
sound reason to believe that the evolution of language has led to their 
divorce. 

In conclusion, what do these vol\mes lack? Nothing, I think, except 
perhaps a chapter on the pre-speech developcoent and communicative use of 
prosody. Allen and Hawkins (Vol. 1, chap. 12) do, in fac t,jBrovide a thorough 
review of a sizeable literature on the development of s^^iable stress and 
rhythm, ae well as a report of their own research on syllabic weight and 
accentuation in 3- and 5-year-olds. And Clianeck {Vol. 1, chap. I3) reviews 
the acquisition of tone in Thai and Mandarin Chinese, showing that pitch 
begins to be used for lexical contrast only when the child beg; ins to use words 
modeled on the adult language. ¥hat we miss among the chapters on pre-speech 
is some account of the infant's first attempts to communicate, and of the 
gradual differentiation of segmental from supraseg^ental utterance. 

Bone thel ess , t he se v ol unes prov id e a sol id rev ie w of an inc reasing ly 
complex field with deep implications for our understanding of the biological 
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baM0 af ap^Mh and languat^- adltora are to be congratulated on 

collectlcg a group of eaaaya that idll certainly Influence the direction of 
reaearch in the field during the coming decade. 

REFraEMCES 

Elmaa, P. D., Siqueland, E. R. , Juac^fk, P., & Vigorito, J« Speech perception 

in inlknts. Science , 1971, 171 1 303-306. 
JakDbaon , B. [ Child language, aphasia and phonological universals ] 

(A. Keller, Trana.). !Ilhe Higue: Mouton, 1968. 
Twplin, M. Certain language akille in children . Minneapolis: ttiiversity of 

Minneeota Press, 1957* 




31. 



334 



II. PUBLICATIONS 
III. APPEWDIX 
^IV. ERRATA 



3R-71/72 (1982) 
(July-December) 



PUBLIOATIOHS 

Abrutson, A* S., A Syastikula, K« Intersections of tone and intonation in 
Thai. In H. rujiaaki a B. Girding (Bds.) t Proceedings of the Working 
. Group on Intonation, XIII th International Congress of Linguists / Tokyo: 
Tokyo diversity Press, in press # 

Baer» T. Basic aspects of laryngeal function as related to singing. Tn V* 
lAvrence (Bd.)f Transcripts of the Eleventh Symposium : Care of the 
Professional Voice . Hew York: *The Voice Foundation, in press. 

Baer» T. » & Alfonso, P* J* On siaultaneous neuromuscular, movement, and 
acoustic measures of speech articulation. In D. Beasely, C. Prutting, 
T. Gallagher, A H. G. laniloff (Eds.), Current issues in language 
science . Vol. II: Hormal and disordered speech . San Diego: College 
Hill Press, in press. 

Bell-Berti, P., Henderson, J., 4 Honda, K. A "view from the top" of the velar 
port. In V. lavrence (Bd.), Proceedings of the 9th International 
Symposium of the Collegium Medicorum Theatri , in press. 

Best , C. T. , A MioKain, K. S. Discovering messages in the medium: Speech and 
the prelinguistic infant. In H. E. Fitsgerald, B. Lester, A H. Yogman 
(Eds.) , Advances in pediatric psychology (Vol. 2). New York: Plenum, in 
press, 

Borden, G. J. Initiation versus execution time during manual and oral 
counting by stutterers. Journal of Speech and Hearing Disorders , in 
press . 

Carello, C. , TUrvey, M. T. , Kugler, P. K. , 4 Shaw, R. E. Inadequacies of the 

computer metaphor. In M. S. Gaseaniga (Ed.), Handbook of cognitive 

neuroecience . New York: Flenun, in press. 
Collier, E. , Bell-Berti, P., 4 Raphael, L. Some acoustic and physiological 

observations on diidithongs. Language and Speech , in press. 
Crowder, R. G. Is it Just reading? Comments on the papers by Mann, Morrison, 

and Wolford and Fowler. Developmental Review , in press. 
Crowder, R. G. ' Disinhibition of masking in auditory sensory memory. Memory A 

Cognition , 1982, JIO, 424-433. 
Drewnowski , A. , 4 Healy, A. P. Rionetic factors in letter detection: A 

reevaluation. Memory a Cognition , 1962, 10, 145-1 54* 
Peldman, L. B. Bi- alphabet ism and word recognition. Published proceedings 

HATO Conference on the Acquisition of Symbolic Skills . New York: 

Plenum, in press. 

Peldman, L. B., 4 Turvey, M. T. Word recognition in Serbo-Croatian is 
phono logic ally analytic. Journal of Experimental Psychology : Human 
Perception and Performance , in press. 

Fowler, C. A. Beading research— Theory and practice. (Review of Reading 
research; Advances in theory and practice , Vole. 1 and 2, edited by 
T. G. Waller and G. E. MacKinnon. New York: Academic Press, 1979f 
1981.) Contemporary Psychology , 1982, 27_f 522-524* 

Powler, C. A. Review article. (Review of Errors in linguistic performance : 
Slips of the tongue, ear, pen and hand , edited by V. Promkin. London: 
Academic" Press, I960.} Linguistics , 1982, J9.f 819-840. 

Powler, C. A. Converging sources of evidence on spoken and perceived rhythms 
of speech: Cyclic production of vowels in sequences of monosyllabic 
stress feet. Journal of Experimental Psychology : General , in press. 

Powler, C. A. , a TUrvey, M. T. Observational perspective and descriptive 
level in perceiving and acting. In W. B. Weimer 4 D.. S. Palermo (Eds.), 



SR-71/72 (19B2) 
(July-December) 



Cognition and the ajrabolic proceaaee (Vol. 2)« Hllladale, N. J. : 
Lawrence Erlbaia Aaaociatea, 19B2, 1-*19« 
Goodaan, D«, A Kelao, J. A. S. ficplorlng the functional significance of 
phyaiological treaor: A bioapectroacopic approach. Experimental Brain 
Heaaarch , In praaa« 

Hanaon, V* L* Use of orthographic atructure by deaf adulta: Recogniti/6n of 
fingarapalled vorda. Applied Paycholinguiatics , in press. 

Hanaon/ V. L« Short-term recall by deaf aigners of ''Merican Sign language: 
Implioationa of encoding atrategy for order recall . Journal of 
Bxperimantal Paychology ; Learning » Memory and Cognition , 1982, 8, 572- 
583. 

H&naon, V. L. , A Bellugi, U« Cn the role of sign order and morphological 
atructure in memory for Merican Sign language sentences. Journal of 
Verbal Learning and Verbal Behavior , 1982, 21_, 621-633- 

Ubenian» I. Y. A language^ oriented viev of reading and its disabilities. In 

H. Ifyklebuat (Ed.), Progreaa in learning disabilities (Vol. 5)- New 
York: Grune A Stratton, in'presa. 

Ltaker, L. Letter to the Biitor. Journal of Phonetics , 1962, J£, 333-334. 
lukatelat G. » Kbatid, A., Feldman, L. B., A TUrvey, M. T. Grammatical priming 

of inflected nouna. Memory A Cognition , in press. 
Lukatelat G. , Moraca, J., Sto j nov , D. 7 Sav id , N., A !I\irvey, N. T. Grammatical 

prilling effects between pronouns and inflected verb forms. Psychological 

Beaearch ( Paychologiache Forschung ) , in press. 
NacKain, K. S. , Studdert-Kennedy, Spieker^ S. , A Stem, D. Infant 

intermodal speech perception is a left hemisphere function. Science , in 

press. ^ 
NacKain, K. S. , Studdert-Kennedy, H., Spieker, S. , A Stern, D. Infant's 

lateralised perception of auditory- visual relations in speech. In 

D. Ingram ( Ed . ) , Proceedings of Second International Conference for i^tudy 

of Child Language . Lanham, Md.: University Press of America, in press. 
Mann, V. A. Beading skill and lax^uage skill. Developmental Review , in 

preaa. 

Mann, V. A., A Llberman, A. M. Some differences between phonetic and auditory 

modes of perception. Cognition* in press. 
McGarr, H. S. Differences between experienced and inexperienced listeners to 

deaf speech. Journal of Speech and Hearing Research , in press. 
McGarr, N. S. , A Harris, K. S. Articulatory control in a deaf speaker. In 

I. Hochberg, H. Levitt, A M. J. Osberger (Eds.), Speech of the hearing 
imi^ired : Reaearch, training, and personnel preparation . Baltimore: 
Ihiveraity Rirk Press, in preaa. 

RLimi, S. , Bell-Berti, F. , A Harris, K. S. rrsrnamic aspe<M;s of velopharyngeal 

closure. Folia Phoniatrica , 1982, 3£, 246-257« 
Qgnjenovid, V., Lukatela, G. , Feldman, L. B. , A Turvey, M. T. Misreadings by 

beginning readers of Serbo-Croatian. Quarterly Journal of Experimental 

Psychology , in press. 
Bemes» R. E. , Rabin, P. E. , A Pisoni, D. B. Coding of the speech spectrum in 

three time-varying sinuaoids. In C. W. Parkins A S. W. Anderson (Eds.), 

Cochlear implantation . New York: New York Academy of Sciences, in 

preaa . 

Bepp, B. H. Bidirectional contrast effects in the perception of VC-CV 
aequencea. Perception A Peychophysica , in press. 



3 tj * > 

O , 338 



ERIC 



SR.71/72 (1982) 
(Jul y-December) 



Heppt B. H. Categorical peroeptlon; Issues, methods, findings. In 
I. J. lASS (Bd .) t " Speech and language ; Advances in basic research and 
practice (Vol. 10). lew York: Acadaaic Press, in press. 

Studdio^t^Kennedy, M. LUiits on alternative auditory representations of 
speech. In Proceedings o f the ^ International Conference on Cochlear 
Prostheses. law York: lev York Acadaay of Sciences, in press. 

Studdert^Kennedy, M. More analysis, less synthesis, please. The Behavioral 
and Brain Sciences , in press. 

Studd^eH-Kennedy, M. Perceiving speech events. In R. B. Siaw 4 R. Warren 
(Eda.) , Proceedings of First International Conference on Event 
Perception , University of Connecticut, June 1981, in press. 

Studdert-Kennedy, M. (Bd.) Psychobiology of language . Cambridge, Mass.: MIT 
Press, in press. 

Studdert-Kennedy, M. Discovering the sound pattern of a language. (Review of 
Child phonology . Vol. 1: Production , and Vol. 2, Perception , edited by 
G. H. Yeni-Komahian , J. P. Kavana^jh, and C. A. Ferguson. New York: 
. Acadenic Press, 1980.) C ontemporary ' P sychology , 1982, 27_f 510-512. 

Studdert-Kennedy, M. Qa the dissociation .of auditory and phonetic perception. 
In R. Carlson 4 B. GranstrBm (Eds.), The representation of speech in the 
peripheral auditory system , tosterdam; Elsevier Biomedical Press ^ 1982, 
3-10. 

Turvey, M. T. , Fteldman, L. B., 4 Lukatela, G. The Serbo-Croatian orthography 
constrains the reader to a phonologically analytic strategy. Visible 
Language , in press. 

Watson, B. C, 4 Alfonso, P. J. Foreperiod and stuttering severity effects on 
acoustic^ laryngeal reaction time. Journal of Fluency Disorders , in 

press. , 
Molford, G., 4 Jbwler, C A. Differential use of partial information by good 

and poor readers. Developmental Review , in press. 
Wolford, G., 4 Fbwler, C. A. Parception and use of information by good and 

poor readers. In T. Tighe 4 B. Shepp (Eds.), Development of perception 

and cognition : The second Dartmouth multiperepective conference . 

lETlsdale, H.J.: Lawrence Erlbaum Associates, in press. 



35i 

339 



SR-71/72 (1982) 
(July-December) 



APPENDIX 



DIXC (Defense Tectanlcal Infoxaatlon Center) 
Infomatlon Center) niaibere: 



and ERIC (Educational Resources 



Statue Report 

SR-21/22 
SR-25 
SR-24 
SR-25/26 
SR-27 
SR-28 
SR-29/50 
SR-51/52 
SR-33 
SR-34 
SR-35/36 
SR-37/38 
SR-39/40 
SR-41 
SR-42/43 
SR-44 
SR-45/46 
SR-47 
SR-48 
SR-49 
SR-50 
SR-51/52 
SR-53 
SR-54 
SR-55/56 
> SR-57 
SR-58 
SR-59/60 
SR-61 
SR-62 
SR-63/64 . 
SR-65 
SR-66 
SR-67/68 
SR-69 
SR-70 



January - June 1970 
July - September 1 970 
Oc tober - December 1 970 
January - Junb><!971 
July - September 1971 
Oc tober - December 1 971 
January - June 1972 
July - December 1 972 
January - March 19T3 
April - June 1973 
July - December 1 973 
January - June 1974 
July - December 1 974 
January - March 1975 
April - September 1975 
October - December 1975 
January - June 1976 
July - September 1 976 
October - December 1976 
January - March 1977 
April - June 1977 
July - December 1977 
January - March 1978 
April - June 1978 
July - December 1 978 
January - March 1979 
April - June 1979 
July - December 1 979 
January - March 198O 
April - June 1980 
July - December 1 980 
January - March 198I 
April - June 4 981 
July - December 1 981 
January - March 1982 
April - June 1982 



DTIC 

1/ A A V 




ERIC 


AD 71 Q^82 


ED 044-679 


AD 72*5586 


ED 052-654 


AD 72761 6 


ED 052-653 


AD 7*^001 "5 


ED 056-560 


AD 71CrS'^Q 


ED 


071 -533 




ED 061-837 




ED 071 -484 • 




ED 


077-285 




ED 


081 -263 


Al^ rool (0 


ED 


081-295 


AD 7747yy 


ED 094-444 


AD 7o554o 


ED 094-445 


AD AUU04^ 


ED 


102-635 


k 1\ A ni TI'^OK 
AJJ AUl jjdO 


ED 


109-722 


AD AO! 0)09 


ED 


1 1 7-770 


AD A.03pUP9 


ED 


119-273 


Ail AU^Ol ^0 


ED 


123-678 


AJJ AU^l foy 


ED 


1 28-870 




ED 


135-028 


A TI A HA 1 A AO 
AHA \J»f 1 H 


ED 


141-864 


A TI A r\A A po n 
Ail AU440^v/ 


ED 


144-1 38 


A TI A HAM 1 R 


ED 


147-892 


AD A0SS8<)'5 


ED 


155-760 


AD A067070 


ED 


161 -096 


AD A065575 


ED 166-757 


AD A083179 


ED 


170-823 


AD A077663 


ED 


178-967 


AD A082034 


ED 


181-525 


AD A085320 


ED 


185-636 


AD A095062 


ED 


196-099 


AD A095860 


ED 


197-416 


AD A 09995 8 


ED 


201 -022 


AD AIO509O 


ED 


206-038 


AD A11085 


ED 


212-010 


AD A12081 9 


ED 


214-226 


AD A1194 26 


ED 


219-834 



Inforaatlop on ordering any of these Issues may be found on the following page^ 
••DTIC and/or ERIC order numbers not yet assigned. 



/ 



ID noilMrs mmj be ordered froa: ED ntnbers aay be ordered tvmt 

UiS. OBpartBent of Ooraeroe EBIC Doovnent Beproductlon Service 

Ifttional Tecbnlcal Information Service Computer Microfila International 

5285 Port Boyal Boad Corp. (CMIC) 

Springfield, Virginia 22151 P.O. Box 190 

Arlington, Virginia 22210 

J 

Haskina Laboratoriea Statue Report on Speech Heaearch ia abatracted in Language 
and Language Behavior Abatracta , P.O. Box' 22206, San DLego, California 
92122. 



35. 



342 



Trading wlntione in the perception of speech by five-year-old children, by 
Bick C. Bobedn» Barbara A. Horrongiello, Catherine T. Best, and Rachel 
K. Clifton. Haakins Laboratories status Heport on Speech Research , 1962, SR- 
70, 255-274. 

p. 255 Acknowledgment > 

UMH Grant lfHCX}332 to Rachel Clifton should be added, 
p. 266 Paragraph 3» !• 5 should read: 

p (correct) - {2 ♦ [p ("say" on first member of comparison) - p ("say" on 
second maiber]2 ♦ [p ("stay" on first member) - p ("stay" on second 
member) ]2) 4, 



UltCLASSIFIED 



»»cmitv CU»«lfic»tion 



DOCUMENT CONTROL DATA -R&D 

$9tufitf tlm%$mftl0n mi Hilt, kmdy mb»tfmrt mnd Itut^minj mnnmtmUon nmmt 6c >n»f d wh0n thm ovf » rmport Im clm»»Hfd) 



OiltOtMATtMO ACTIVITY (Cmt»Mf •Utho9) 

Haaklns laboratorlM 
270 Crovn Street 
lev Bftven. CT 06310 



2«. RC^ORT tCCUPttTY C C A ttl F I C A T I ON 

' Unclaeaif led 



96. GPtOUP 

H/A 



HASkine laboratories Statue Report on Speech Besearch, SR-71/72, July 
December » 1982 



InteriJi Scientific Heport 

9 AUTMOAlti (F4f»t n*m#. middim InllimI, tm»t nmmm} 



Staff of Haaklna Laboratorlee, Alvln M. Uberaan, P.I. 



• mmmomr oa re 

December, 1982 


im. TOTAL NO- OF PAGCS 1 76- NO OF RKFt ^ 

356 1 553 


CONTMACT on aRANT NO 

HD-01994 BNS-81 11470 

HD-t6591' NS13er70 

HOI -HD-1 -2420 BS15617 

RR-055^6 NS18010 

PRf -80061 44 N0001 4-85-C-0083 


OniOINATOR't RCPOHT NOMBCRtt) 

SR-71/72 (1982) 


•6. OTHER RCPORT NOtt) (Any of6«r num6«rfl thmt mmy 6* maatgnmdl 

* 

None 


• 0> OltTWIMuTION •TaT«m»NT ^ 

Dlatrlbutlon of thle docment Is unlimited* * . 


f H/A 

* 


ia. tPONSORiNO MlUiTAPtV ACTIVITY 

See No. 8 


It. AMtTMACT 



ThU report (l July-51 l>«c«b#r) im oa« of ■ r««ul«r Mriss on tho aUtua mnd pro«P»o» of otudUo 
on th« o*tur« of Bpasob, lnotr«i«nUUon for iU Invootlgotion, and prootloal oppliootiono. 
ItonuaoripU oovsr ths follovloc toploot 

-Conv*rfin« •oupc»» of •vldtnbo on spoton and p*rcoiv«d rhjrthBo of optoobt Cyollo production of 

tomIs in Mqu«nooo o f aaDOoylloblo otrooo foot 
.5oaa dlfforonooo botMon plionotlc «nd ouditory aodoo of porooptlon 
-Duplox p^rooptiont Confirwotion of fusion 

^On tho kia«otioo of ortioulotory oontrol tm • function of otroao and rato 
*0d oiauiunoous Q«uroau90ular , aowcnt, ood acoustic. aaaouroo of apaaoh articulation 
-Tha ralation batsaan proniaioation and raoofnition of printad wrdo in daap and ahallov 
ortbographiaa 

^Infant intaraodal spaacb parcaptlon ia a laft haaiaphara function 
.Parcaptual aaaaaaaant of ooarticulation in aaquanoaa of two atop conoonanto 
-Subcatagcrical phonatic ■laaatobaa alow phonatic judfaanta 
•Tomrd a dToaaioal accotnt of actor mmorj and Oontrol 

«ta tha "cogaitlTa panatrabUit/ orltarion invalidattd bj contaaporary phyvica? 
•loadaquaciaa of tha coeputar aatai^r ^ *i , 

.Parcaptual intairatica of apaotral and taaporal ouaa for atop conaonant placa of articulation i 

lav putalaa ^ 
•Accuatic laryaaaal raaotion tiaas Porapariod and atuttarlnf aavarity affacts 
•Diainbibition of aaiklnc in auditory aanaory aaaory 
•Lattar to tha Blitor, Journal of Phonatica 

-la it Jiiat raadioi? OMSanta on tha pipara by Itenn, Norriaon, and Wolford and Ibular 

•Old problaaa and oav dlraotiona in aotor bahavior. Book Btviavt Sohaidt, R. A. Wo tor oontrol 

and laamlnai A bahavioral «phaaia , . * ^ ^ 

>DlSr oYarin< tha i buni fattara olt • Vrn gmg: fttviaw of Child Phonology . Vbl. 1, Produotion, and 

foi* 2. Parcaption , adlttd by Tani-Koaahian , 0.' H. , tovanagh, J. P., t I^rgufon, C. A. 



DD /r-HTa 



(PAGE 1) 



krnY/^"*^ oioi»i07-«ei I #7hia document contains no informs- 
Clyy tion not freely available to the general public 

It is distributed prlmsrily for library use. 



UNCLASSIFIED 
Sacurtty Claaallieatlon 



A-9140t 



35 



UHCLASSIFIED 



• l««M»ttv CU»»lltc>tlon 



14 

KCV WOAOt ^ 


LtMM A 


LINK • 


C i N « c 


MQLC 


WT 




W T 




ft T 


^•ach Ptrcaption: 

rhjtlui, tiaiof^v vovelSt stress, nonosyllablee 

pliOMtlc, auditory, differences '\ 

duplex, fusion, binaural 

visual, auditory, infants, left hemisphere 

ooartioulatidh, stop consonants 

phonetic identification, reaction tiiae, 

coarticulation 












* 

1 -i 


Speech Articulation: 

kinematics, stress, rate 
muscles , neural , acoustic 














Read ing : 

pronunciation, recognitioOi, orthographies , 
deep vs* shallow 














Motor Control: 

memory, motor, dynamic theory 
computer metaphor, criticism 
cognition, behavior, determinism, 
diasipative systems 




■■■ 


• 








• 




• 






























> 














/ 









OHD '!^,A473 «B*CK» UHCLASSIFIED 

Jj(^'«<i OlOi'iot.ttii Security CUttifieailon 



